I had a need to download a gzip tarball file (.tgz), verify its checksum and extract one particular file in it. All in one go using Bash. By using Bash’s process substitution and
tee, such a task could be achieved in a one-liner.
I had this need when I built a Docker image for Retype, a static site generator from Markdown files. Retype was not a pure Node.JS package. The NPM package was just a CLI wrapper for the actual binary built using .NET technology.
npm install email@example.com would do the following:
- Download the
- Verify its SHA-1 checksum.
- If it is okay, extract the
.tgzfile content into
As Node and NPM were totally unnecessary, I decided to use only the
debian as the base image. But I also would like do the same what NPM did.
There was a small problem. I knew how to pipe
tar to extract the downloaded file:
curl https://example.org/some.tgz | tar -xz -f - path/to/one/specific/file
But I didn’t know how to put
sha1sum into the chain because of two reasons: (1) I needed to feed
sha1sum the content of file as
stdin and (2) also specified a file containing the checksum format.
$> sha1sum --help Usage: sha1sum [OPTION]... [FILE].. # omitted
Luckily, I found one hidden gem in StackOverflow. Turned out there was someone who had the same need and asked “creating a file downloading script with checksum verification” and the answer from @user239558 suggested a neat solution.
The final command was:
# Each line is intentionally prefixed with a line number. 1| curl https://registry.npmjs.org/\ 2| retypeapp-linux-x64/-/retypeapp-linux-x64-1.11.1.tgz \ 3| | tee >(tar -xz --strip-components=2 -f- package/bin/retype) \ 4| | sha1sum -c <(echo "2a53485d5d74c053be868b4f61a293f80aca39bd -") \ 5| || rm retype
The keys of the solution were: process substitution and
tee. Bash’s process substitution (
<()) run the command between parentheses as a separate process and its input or output would appear as filename (i.e
tee (as a pipe T) allowed to copy the current
stdin into two outputs.
The one-liner would work as follow:
- Line 1-2 were just the usual
curlfor downloading the file.
- At line 3, the content of the downloaded file were piped into
tee. From here on the stream would be copied into two destinations: piped to the next command using
|and written into a file created by the process substitution
- The process substitution received the stream as its
/dev/stdinand passed to
tarwould then simply extract the zipped content from
stdin(as specified via
-f-) to the current directory.
- At the same time,
sha1sumat line 4 also received the content and calculate the checksum and check (
sha1sumrequired a checksum file to check, process substitution came to rescue. The result of
echo "2a53485d5d74c053be868b4f61a293f80aca39bd -"would be converted into a file (which could only be accessed by the current process).
- Line 5 acted as a safety net. Only if
sha1sumexited with an exit code other than
0(successful), meaning the checksum did not match, then the extracted file would be deleted.
NOTE: the Dockerfile was available on my Gist.