I had a need to download a gzip tarball file (.tgz), verify its checksum and extract one particular file in it. All in one go using Bash. By using Bash’s process substitution and tee
, such a task could be achieved in a one-liner.
I had this need when I built a Docker image for Retype, a static site generator from Markdown files. Retype was not a pure Node.JS package. The NPM package was just a CLI wrapper for the actual binary built using .NET technology.
Typically, running npm install retypeapp-linux-x64@1.11.1
would do the following:
- Download the
.tgz
distribution. - Verify its SHA-1 checksum.
- If it is okay, extract the
.tgz
file content intonode_modules
.
As Node and NPM were totally unnecessary, I decided to use only the debian
as the base image. But I also would like do the same what NPM did.
There was a small problem. I knew how to pipe curl
to tar
to extract the downloaded file:
curl https://example.org/some.tgz | tar -xz -f - path/to/one/specific/file
But I didn’t know how to put sha1sum
into the chain because of two reasons: (1) I needed to feed sha1sum
the content of file as stdin
and (2) also specified a file containing the checksum format.
$> sha1sum --help
Usage: sha1sum [OPTION]... [FILE]..
# omitted
Luckily, I found one hidden gem in StackOverflow. Turned out there was someone who had the same need and asked “creating a file downloading script with checksum verification” and the answer from @user239558 suggested a neat solution.
The final command was:
# Each line is intentionally prefixed with a line number.
1| curl https://registry.npmjs.org/\
2| retypeapp-linux-x64/-/retypeapp-linux-x64-1.11.1.tgz \
3| | tee >(tar -xz --strip-components=2 -f- package/bin/retype) \
4| | sha1sum -c <(echo "2a53485d5d74c053be868b4f61a293f80aca39bd -") \
5| || rm retype
The keys of the solution were: process substitution and tee
. Bash’s process substitution (>()
and <()
) run the command between parentheses as a separate process and its input or output would appear as filename (i.e /dev/fd/1234
). tee
(as a pipe T) allowed to copy the current stdin
into two outputs.
The one-liner would work as follow:
- Line 1-2 were just the usual
curl
for downloading the file. - At line 3, the content of the downloaded file were piped into
tee
. From here on the stream would be copied into two destinations: piped to the next command using|
and written into a file created by the process substitution>()
. - The process substitution received the stream as its
/dev/stdin
and passed totar
.tar
would then simply extract the zipped content fromstdin
(as specified via-f-
) to the current directory. - At the same time,
sha1sum
at line 4 also received the content and calculate the checksum and check (-c
). Assha1sum
required a checksum file to check, process substitution came to rescue. The result ofecho "2a53485d5d74c053be868b4f61a293f80aca39bd -"
would be converted into a file (which could only be accessed by the current process). - Line 5 acted as a safety net. Only if
sha1sum
exited with an exit code other than0
(successful), meaning the checksum did not match, then the extracted file would be deleted.
NOTE: the Dockerfile was available on my Gist.