Paralelism, Concurrency and Logging in Bash

A few day ago I was working solving an issue related to logging the output from two programs and this is the experience.

October 25, 2017 - 5 minute read -
personal

A few days ago I was working solving an issue related to logging the output from two programs and making sure they will stop when I wanted.

User case

./echo-for-ever.sh # This will output a random message every X seconds
./print-cats # This will ping a lot of sites with cat gifs

Both programs are going to be executed until you stop it (example, using Ctrl + C), imagine they are in a infinite loop. Then, is necessary to save the output of both programs in separated files (inside /tmp folder, for example). So you would end doing something like:

./echo-for-ever.sh > /tmp/.log1
./print-cats > /tmp.log2

Everything is fine now, but now you need both to be executed at the same time. Easy, right? This would be the final answer:

./echo-for-ever.sh > /tmp/.log1 &
./print-cats > /tmp.log2

But what happens when you try to stop the execution? You use something like Ctrl + C to stop the infinite execution of both programs and their logging process, you will get something like:

$ ./echo-for-ever.sh > /tmp/.log1 & ./print-cats > /tmp.log2
[#] PID_NUMBER

$

This sounds good, you check the log files, everything is logged there but hey, the second log file (.log2) keeps getting updated with new entries. Why? You check ps aux looking for something weird and yeah, you found ./print-cats is still running and has a PID number assigned.

But, why this happens? We stopped the process how we usually do it, why is it still alive? Well, let me explain you what happened and talk you about a few concepts.

Concurrency

How concurrency applies here? You might end using the & command separator thinking about run programs/commands concurrently, in this case:

$ ./cmd1 & ./cmd2 & ./cmd3

This will start cmd1 in background, same with cmd2 and finally cmd3 will start as expected. Think this like a chain of commands and programs. It’s not the same something like this:

$ ./cmd1; ./cmd2; ./cmd3;

In this case cmd1 will start executing but cmd2 and cmd3 will start only after the previous process as finished/exit.

Parallelism

This would be a clear example about how to run X number of programs in parallel.

#!/bin/sh

/cmd1 &
/cmd2 &

wait
echo This will be execute once both programs end

All two processes will be forked in parallel and run in background, the script will wait until both have completed before showing the echo message. Personally, this concept for parallel exec on bash would be similar to other functions/process in different languages. For example, the wait call is similar to an async process, but hey, that might be just for me.

This might not be your preferred way to do parallel work on bash, there are many external tools like GNU parallel or xargs that will give you a better experience and debugging process. But on a scenario where you can only use Bash (no extra dependency) this would be the right answer. Another thing to consider in this way to parallel execution is there’s not a was (as far as I know) to really determine the exit codes of the process you forked. Consider this especially writing tests, if is fine for you, here you go.

Output Logging

This is the most common answer to keep/save the output from a command/program.

$ ./cmd1 > .log1 & ./cmd2 > .log2 & ./cmd3 > .log3

Depending on what you decide here, will change the Bash support of your program/answer. This is the classic and portable way (Bash pre-4):

$ ./cmd1 >> .log

A nonportable way (Bash 4 and beyond), this option is just a shorter syntax, it doesn’t introduce any new functionality comparing to Bash pre-4 implementation:

$ ./cmd1 &> .log

You can pipe the output of each command into tee, this will help you the mirror the output to.

$ ./cmd1 | tee .log1 & ./cmd2 | tee .log2 & ./cmd3 | tee .log3

The output of the commands will be shown mixed or be a little messy, but we can fix that easily using sed.

$ echo 'Output -> cmd1' | sed -e 's/^/[cmd1] /'
[cmd1] Output -> cmd1

All together will look like:

$ cmd1 | tee .log1 | sed -e 's/^/[cmd1] /' & cmd2 | tee .log2 | sed -e 's/^/[cmd2] /' & cmd3 | tee .log3 | sed -e 's/^/[cmd3] /'
[cmd1] init system
[cmd2] start fetching cats
[cmd1] world is about to explode
[cmd3] no yet, we are safe

You might see some articles and projects using >> and 2>&1, for example:

./cmd1 >> .log 2>&1

This means:

  • >>: Open in append mode and redirect stdout.
  • 2&1: Redirect stderr where stdout is currently going.

Killing it

trap is your friend. If you want to stop them at once:

$ trap 'kill %1; kill %2' SIGINT

This will run cmd1 and cmd2 in the background and cmd3 in the foreground, which lets you kill it with Ctrl + C for example. When you kill the last process, 'kill %1; kill %2’ will be executed, because the execution is connected with the reception of and INTerupt SIGnal, this is what you send when you hit Ctrl + C.

You might want to remove the trap, after you finished with you commands you might be run:

trap - SIGINT

Conclusion

For the user case this would be a proposal answer:

trap 'kill %1; kill %2' SIGINT
./cmd1 >> .log1 2>&1 &
./cmd2 >> .log2 2>&1
trap - SIGINT

I really like concurrency when is considered complex to write a concurrency program. There are modern tools to execute large and more complicated systems that require extensive control (input and output) but for small cases like this, I think is a pretty fine answer.

Thanks

This post is obviously done thanks to many hours of research and people who worked on this too. Those are a few websites and docs I found and helped me to craft this post.

Thank you: