Download File and Verify Checksum in Bash

I had a need to download a gzip tarball file (.tgz), verify its checksum and extract one particular file in it. All in one go using Bash. By using Bash’s process substitution and tee, such a task could be achieved in a one-liner.

I had this need when I built a Docker image for Retype, a static site generator from Markdown files. Retype was not a pure Node.JS package. The NPM package was just a CLI wrapper for the actual binary built using .NET technology.

Typically, running npm install retypeapp-linux-x64@1.11.1 would do the following:

  • Download the .tgz distribution.
  • Verify its SHA-1 checksum.
  • If it is okay, extract the .tgz file content into node_modules.

As Node and NPM were totally unnecessary, I decided to use only the debian as the base image. But I also would like do the same what NPM did.

There was a small problem. I knew how to pipe curl to tar to extract the downloaded file:

curl https://example.org/some.tgz | tar -xz -f - path/to/one/specific/file

But I didn’t know how to put sha1sum into the chain because of two reasons: (1) I needed to feed sha1sum the content of file as stdin and (2) also specified a file containing the checksum format.

$> sha1sum --help
Usage: sha1sum [OPTION]... [FILE]..
# omitted

Luckily, I found one hidden gem in StackOverflow. Turned out there was someone who had the same need and asked “creating a file downloading script with checksum verification” and the answer from @user239558 suggested a neat solution.

The final command was:

# Each line is intentionally prefixed with a line number. 
1| curl https://registry.npmjs.org/\
2| retypeapp-linux-x64/-/retypeapp-linux-x64-1.11.1.tgz \
3|  | tee >(tar -xz --strip-components=2 -f- package/bin/retype) \
4|  | sha1sum -c <(echo "2a53485d5d74c053be868b4f61a293f80aca39bd -") \
5|  || rm retype

The keys of the solution were: process substitution and tee. Bash’s process substitution (>() and <()) run the command between parentheses as a separate process and its input or output would appear as filename (i.e /dev/fd/1234). tee (as a pipe T) allowed to copy the current stdin into two outputs.

The one-liner would work as follow:

  • Line 1-2 were just the usual curl for downloading the file.
  • At line 3, the content of the downloaded file were piped into tee. From here on the stream would be copied into two destinations: piped to the next command using | and written into a file created by the process substitution >().
  • The process substitution received the stream as its /dev/stdin and passed to tar. tar would then simply extract the zipped content from stdin (as specified via -f-) to the current directory.
  • At the same time, sha1sum at line 4 also received the content and calculate the checksum and check (-c). As sha1sum required a checksum file to check, process substitution came to rescue. The result of echo "2a53485d5d74c053be868b4f61a293f80aca39bd -" would be converted into a file (which could only be accessed by the current process).
  • Line 5 acted as a safety net. Only if sha1sum exited with an exit code other than 0 (successful), meaning the checksum did not match, then the extracted file would be deleted.

NOTE: the Dockerfile was available on my Gist.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s