journal

Less well-known uses of curl

When it comes to make HTTP calls, I always use curl as it is a ubiquitous tool for the job.

Today, I discover that I can use curl for some other tasks.

copying files

curl supports the FILE protocol (file:/), therefore it is possible to “download” a file:

$ curl file:/path/to/some/large/file -o /the/destination/file
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2389M  100 2389M    0     0   339M      0  0:00:07  0:00:07 --:--:--  369M

See http://askubuntu.com/questions/17275/progress-and-speed-with-cp

querying LDAP

Normally, when I needed to query an LDAP server, ldapsearch is always the de facto tool though it may not be installed on some environment.

Nowadays, I tends to use Docker image for LDAP for the job:

$ docker run --rm --name ldap -it --entrypoint bash emeraldsquad/ldapsearch:latest

Inside the container, I use ldapsearch for querying:

$ ldapsearch -x -LLL  \
  -h ldap-test-server.example.com \
  -p 8081 \
  -D 'uid=admin,ou=system' \
  -w ${LDAP_PASSWORD} \
  -b 'ou=users,dc=example,dc=com'

Today, I learned that I can achieve the same task with curl:

$ curl -v \
    -u "uid=admin,ou=system":${LDAP_PASSWORD} \
    "ldap://ldap-test-server.example.com:8081/ou=users,dc=example,dc=com??sub?(objectclass=*)" 

This is really great as curl may come pre-installed on lots of environments whereas ldapsearch and Docker may not.

If you need a more sophisticated query, consider giving LDAP URL Format a read. It will explain the structure of the URL you could use with curl.

journal

Convert Git to Git LFS

There are some Git repositories in the company contain mostly binary files (Words, Excel, PDFs, etc). As Git is not designed to track binary files effectively, eventually the repository ends up pretty large (over 2GB) and will become a PITA on git clone.

In order to effectively solves this, switching a regular Git to Git LFS. This post aims to show you how to do it.

Prerequisites

  • Remote Git Server MUST support Git LFS (GitHub, GitLab and BitBucket all supports LFS)
  • git (>=2.27.0) and git lfs (>=2.11.0) MUST be installed.
  • A Bash 5.0 shell. (For Windows, Git Bash may work but it is recommended to use WSL v2).

Steps

In this post, BitBucket as the remote Git server.

Clone a bare repository

In order to effectively overwrite all the history of a Git history, it is required to clone the entire Git repository in its bare form.

IMPORTANT

Make sure you make a backup of the cloned directory.

$ git clone --mirror git@example.com:awesome/awesomely-heavy.git

Make a quick check to see the current size of the repository so that we can verify it again once its done.

$ git count-objects -vH
# Just an example, your result may vary.
count: 0
size: 0 bytes
in-pack: 1053
packs: 1
size-pack: 615.10 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

Migrate binary files to be tracked by LFS (Rewriting history alerts)

IMPORTANT
The following process will purposely rewrite entire your Git history. All commits IDs will be changed.

# Include more file extensions which you want to track by LFS.
$ git lfs migrate import --everything --include="*.docx,*.pdf"

migrate: Sorting commits: ..., done.
migrate: Rewriting commits: 100% (xxx/yyy), done.
# ...lots of refs..omitted
migrate: Updating refs: ..., done

You may wonder “How do I know which extensions to include?”. The following script can help you.

# The following script will output the file extensions existing in your Git repository.
# It was tested on MacOS with the aforementioned `git` version.

$ git log --all --numstat \
    | grep '^-' \
    | cut -f3 \
    `# On your Linux (or WSL), you may want to use sed instead of gsed` \
    | gsed -r 's|(.*)\{(.*) => (.*)\}(.*)|\1\2\4\n\1\3\4|g' \
    | sort -u `# Up until this point, all full-path of files committed will be printed` \
    | xargs -I{} sh -c 'x="{}";echo ${x##*.}' \
    | sort \
    | uniq \
    | awk '{ print "*."$1 }' \
    | paste -sd, -

# example output
*.docx,*.jar,*.xlsx,*.xltm

Clean up the Git repository

At this point, all of your history should be rewritten by new commits. The old commits will effectively become orphans. Git by default will still keep them in the .git directory until an explicly prune is invoked.

# Trigger Git GC to run immediately to remove
# all orphan commits.
# This will effectively remove all binary files stored
# in your .git/objects.
$ git reflog expire --expire-unreachable=now --all
$ git gc --prune=now

Verify Changes

Now verify if your Git repository has become smaller.

# Print out the space usage of the current Git.
# Compare to the previous run, it should be wayyy smaller.
$ git count-objects -vH

count: 0
size: 0 bytes
in-pack: 1054
packs: 1
size-pack: 199.19 KiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

If you wonder where are all the files now. The du command can tell you.

$ du -d 1 .
# in this example output, the files are moved from objects to lfs.
232K    ./objects
8.0K    ./info
 72K    ./hooks
  0B    ./refs
1.0G    ./lfs
1.0G    .

Push to remote repository

It’s time to push the new Git repository into your remote one.

IMPORTANT
It is critical that all refs are pushed into the remote repository.

# The `--mirror` ensure all `refs` are pushed.
# @see https://git-scm.com/docs/git-push#Documentation/git-push.txt---mirror
$ git push --mirror --force

Contact your Git Hosting provider to run git GC

Successfully pushed the new Git LFS into the remote repository not the end of the story.

As Git is a distributed source version control, so your Provider also holds a copy of your entire Git repository. It’s important that your Provider run Git GC on their own infrastructure.

For BitBucket, you have to open a BitBucket Cloud Support ticket and request them to run git GC. GitLab automatically runs gc on each push. I cannot find any relevant information on GitHub (though GitHub Enterprise requires you to contact their Support).

References

journal, today-i-found

TIF – Powerful SSH #1

Recently, I discovered that SSH have some wonderful features and usages that I didn’t know before.

Faster copying directories with rsync via SSH

When it comes to copying files back and forth to a remote server, I usually go for scp.

scp hello.txt remote_user@server.example.com:/tmp/

scp even supports to copy a whole directory:

scp -r files/ remote_user@server.example.com:/tmp

Not until recently, a colleague of mine, Alex, taught me that using rsync happens to be faster than scp when it comes to syncing directories between local and remote server.

rsync -a files/ remote_user@server.example.com:/tmp

The result is fascinating! It is much much faster than scp when it comes to hundreds of files need to be synced. Better, rsync only copy files that has been changed.

There are some more advanced use cases with rsync and SSH like you can establish somehow a rsync daemon on the remote server so that you can sync files/directories over a bastion host. See “Advanced Usage” on man page of rsync.

Check out code

I usually have a need to log in to the server and do a git clone on that server for testing some code.

Cloning a repository via SSH on a remote server requires that server to have an SSH key-pair registered..

…unless we use ssh-agent.

Using SSH Agent Forwarding allows me to SSH into a remote server and do git clone on without the need to actually transfer my private key to that server.

# First run the ssh-agent daemon in case you haven't.
# See https://unix.stackexchange.com/questions/351725/why-eval-the-output-of-ssh-agent for why we gotta use `eval`.
$ eval $(ssh-agent -s)

# Add your identity key into ssh-agent.
# In case you have a key somewhere else, simply specify the path to it.
# You can attach multiple keys if you want.
$ ssh-add -K

# SSH into the remote server using Agent Forwarding option.
$ ssh -A remote_user@server.example.com

# On the remote server, perform git clone as usual
remote_user@server $ git clone git@github.com/myuser/myrepo.git

notes

Sending Messages to Yourself on Skype

I always like a simple feature in Slack that you can chat with yourself to store notes, links, files or your own reminders. Despite that fact that I use it most of the time, this feature doesn’t exist in Skype, and I really hate it.

Turns out there is a simple trick to do it that I happened to discover yesterday:

On Skype, create a new group chat with only you in there.

Done! Name the group my notes or something. Pin it so that you can easily access to it. One more thing: enable the Share Link then save the link. In case you happen to accidentally leave the group, you can re-join it again.

Now that Skype can have the chat-to-yo-self feature like Slack does.

today-i-found

Today I Found: Bill Gates’s message for college grads if they want to change the world.

In his letter, Bill Gates wrote that he was lucky because he started his venture at the right time, when the digital revolution was just underway, and the young people at that time had had a great opportunity to shape it.

Today, college graduates have also the same chance with these fields:

If I were starting out today and looking for the same kind of opportunity to make a big impact in the world, I would consider three fields.

One is artificial intelligence. We have only begun to tap into all the ways it will make people’s lives more productive and creative.

The second is energy, because making it clean, affordable and reliable will be essential for fighting poverty and climate change.

The third is biosciences, which are ripe with opportunities to help people live longer, healthier lives.

But his letter contains a very important point which I also agree with:

For one thing, intelligence is not quite as important as I thought it was, and it takes many different forms.

In the early days of Microsoft, I believed that if you could write great code, you could also manage people well or run a marketing team or take on any other task. I was wrong about that. I had to learn to recognize and appreciate people’s different talents. The sooner you can do this, if you don’t already, the richer your life will be.

(emphasize mine)

Bill had the same point as with Jeremy Harbour in his book “Go Do!” in which: when one who is excel in his technical skill may not be ready to start his own company because he lacks other important skills.

today-i-found

Today I Found: Soft-Coding & #1 Deadly Sin of Programmers

A question on StackOverflow, What is Soft-Coding (anti-pattern), introduced me to the term soft-coding (a pun to hard coding).

To explains simply the term, I quoted the code snippet of the accepted answer:

SpecialFileClass file = new SpecialFileClass( 200 ); // hard coded

SpecialFileClass file = new SpecialFileClass(DBConfig
  .Start()
   .GetConnection()
    .LookupValue("MaxBufferSizeOfSpecialFile")
      .GetValue());

Too much of anything is not good. The above is not an exception. Too much of flexibility led to over-engineering.

This is somehow similar to the #1 deadly sin in the series Seven Deadly Sins of Programming by Eric Gunnerson.

thoughts

Friends are like genes?

When I read the book “The Magic of Thinking BIG”, chapter 7 “Manage your environment: Go First Class”, a thought emerged into my head: friends are like genes.

When a baby was born, it is a product of genetic combinations of both the parents’. The differences between the parents’ genes will produce a brand new one. If, for example, the parents are siblings or close relatives (just for the sake of demonstration), and their children also continue to mate and give births, their children are very likely to have high risk of gene disorders. It is because doing so keeps narrowing down the gene pool or gene diversity.

I think making friends also have the same effect. If you keep making friends with the same type of people, same thoughts, same perspectives, you are narrowing down your thought pool. Making friends with different people will make you see things differently. But it’s important that you must be the one who choose which kind of thought or perspective that can add to your pool.

journal

Morning 28.9

Tweeted an article written by Digg Engineers about how they migrated one of their modules from Node.js to Golang. Their result was a success.

The article gave a very detailed analysis why Node.js did not meet their needs anymore. It also mentioned that the performance of the module was increased a lot. However, they stated that there were no plans to migrate all of the rest to Go.

Then I happened to find out a profile named VietCoding and then find out a blogging platform full of Vietnamese developers with good articles, https://kipalog.com.

Big world!

java

How to iterate over a Collection in Java?

Traditional way

List<Integer> numbers = ...

for (int i = 0; i < numbers.length(); i++) {
    if (i % 2 == 0) {
        Integer value = numbers.get(i);
        if (value % 2 == 0) {
            // do something to value
        }
    }
}

Pros:
– an iconic way
– can access to both index and value at the same time

Cons:
– too much code

Use:
– when you need to know the index of the element in a loop.

The Iterator way

In Java, Array and Collection all implement Iterable which returns an Iterator.

List<Integer> numbers = ...

for (Iterator<Integer> it = numbers.iterator(); it.hasNext();) {
    Integer value = it.next();
    if (value % 2 == 0) {
        it.remove();
    }
}

If you need to know the index, use ListIterator (only work with List):

List<Integer> numbers = ...

for (ListIterator<Integer> it = numbers.listIterator(); it.hasNext();) {
    Integer index = it.nextIndex();
    Integer value = it.next();
    if (index % 2 == 0 && value % 2 == 0) {
        it.remove();
    }
}

Pros:
– supports iterator over all kinds of Collection types (Set, Queue, etc)
– can remove elements during iteration.
– faster than Tranditional way in case of LinkedList or other Collection implementation where random access is time-consuming.

Cons:
– too much code

The Enhanced for way

This is the most recommended way since Java 5 if you just need to iterate over a Collection without the need to access to the index.

List<Integer> numbers = ...

// read as "for each number in numbers"
for (Integer number : numbers) {
    if (number % 2 == 0) {
        // do something with number
    }
}

Tips:

To iterate over an entire java.util.Map, please use this:

Map<String, Object> cache = ...

for (Map.Entry<String, Object> entry : cache.entrySet()) {
    doSomeThingTo(entry.getKey(), entry.getValue());
}

Pros:
– simple, easy to read
– keep variables’ scopes clean (especially in nested loops)

Cons:
– cannot access to the index of the element (well, not every time we need it actually)

The Java 8 way (or the function way or the lambda way)

In Java 8, Stream API has been introduce which makes it very easy to iterate and do somthing no elements of a Collection.

List<Integer> numbers = ...

numbers.stream()
    .filter(number -> numbers % 2 == 0)
        .forEach(number -> doSomething(number));

// or simply

numbers.forEach(number -> doSomething(number));

Pros:
– easy to write and read
– method chain
– lazy evaluate

Cons
– the API has quite a lot of methods so it takes time to master all of them.

thoughts

Passion

My girlfriend shared me an interesting video tonight. It recorded a situation happened in a hospital in which the nurse and the doctor were trying to save a new born baby. The little one seemed to stop breathing. What moved me was how persistent the nurse and the doctor were to save the baby. The nurse, an old lady, tried to warm up the baby’s body by putting on oil, then slapped him in order to make him cry. The doctor did CPR and also helped the nurse to warm him up. But the baby, despite the effort, did not move an inch. They kept doing. After round 10 minutes, the baby moved a little bit and he opened his eyes. They did it. They had successfully saved the baby, bring him back to life. What a relief!

I could image how happy the mother was when she heard the news about her baby being saved. The nurse and the doctor could have gone home, go to bed, and be proud that they did something big that night. They saved a baby, they saved a family. If it were me, I would have had a very good sleep with a smile in my face.

There were lots of things I learned through the video, but one thing made me want to write it out. At some point in the video, there had been a close look at the nurse. A kind old lady. She could have been a grandma to some kids. However, look at how agile she was in handling the situation. Every move she made was precise and with care. The new born baby looked very small and fragile, like a doll. The old nurse moved the baby very quickly, genlty carrying him up and down and then slapped him to wake him up. All of her focus at that moment was how to protect but save the baby. It looked like she had gone through this situation a lot and that was not her first time doing this.

The nurse could have been working in the hospital for decades. She didn’t become a doctor. She didn’t the move out to another hospital for a raise, I guess. She was there to serve and to save people.

I realized that it was stupid and ignorance if I compare the nurse’s job with the other higher paid jobs like CEO or Chairman. I could see that The Nurse loved what she did. She could have been saved lots of life. What she did gradually changed the world. It was how she did it that mattered the most. Every job has its own differences.

It made me question myself. How had I been doing? Had I put in enough passion in every action I made? Stop comparing your job to others.