Often you want to automatize something using shell scripting. In a perfect world your script robot works for you without getting tired, without hick-ups, and you can just sit at the front of your desk and sip coffee.
Then we enter the real world: Your network is disconnected. DNS goes downs. Your HTTP hooks and downloads stall. Interprocess communication hangs. Effectively this means that even if your script is running correctly from the point of operating system it won’t finish its work before you finish your cup of coffee.
Below is an example how to create timeouts and notifications in a shell script.
1. Never gonna give you up
Automation must work 100% and you must always know if it doesn’t do that. Otherwise if you cannot trust the automated systems you could have this glorious moment of “oh it has been broken for three months now”. No amount of coffee makes your day after that. Then you spend your nights pondering whether your scripting is running instead of playing Borderlands 2.
Some safety guards around your coffee cup include
- Sane timeout thresholds for commands to avoid hang situations. E.g. your automated “make smallapp” should probably not run for 50 hours straight.
- Get a notification if a fully automatized process does not end up as expected. So that you find out the problem right away, not three months later. Do not use email as the communication channel, it is unreliable and has awful signal-to-noise ratio. Better solutions include instant messengers (Skype), SMS (check twilio.com)
- Get a notification also when a service, which should be running all the time, is restarted.
2. Timeout.bash
The shell scripting (sh, zsh, bash) does not offer built-in tools for timeouting commands by default (please correct me if I am wrong, but stackoverflow.com et. al folks did not know any better). In Bash cookbook, there exist a timeout wrapper script example (The orignal SO.com answer).
The trick is that when the timeout wrapper terminates the command, you’ll get the exit code of SIGTERM (143) or SIGKILL (137) to the parent script. This might not be entirely clear from reading the script.
3. Using timeout wrapper and sending failure notifications to Skype
Below is an example how you can hook timeout to your own script. Here we have a simple continous integration script (ghetto-ci) which polls version control repositories and on any change executes the test suite. A test failure is reported back to Skype by ghetto-ci using Sevabot Skype bot. The loop script uses a server specific (Ubuntu) way to set up a headless X server, so that the tests can run Firefox on the server (can one call a headless Firefox server “mittens”?) The continous integration loop is deployed simply as leaving it running on the screen‘ed terminal on a server.
Some protection we do: The script has extra checks to protect against hung processes by killing them using pkill before starting each test run. Selenium’s WebDriver seem to often cause situations where the Python tests don’t quit cleanly, leaving around all kind of zombie processes. In this case, the test command timeouts using the timeout wrapper and we get a notification to Skype. Based on this we can refine the test running logic, timeout delay and such to make the script more robust allowing us to focus more on drinking the coffee and less watching a looping process running in a UNIX screen for potential failures.
Pardon me for possibly ugly shell scripting code. SH is not my primary language. The example code is also on Github. Please see the orignal blog post for syntax colored example.
#!/bin/bash # # Run CI check for every 5 minutes and # execute tests if svn has been updated. # # The script sets up xvfb (X window framebuffer) # which is used to run the headless Firefox. # # We will post a Skype message if the Selenium WebDriver # has some issues (it often hangs) # # We also signal the tests to use a special static Firefox build # and do not rely (auto-updated) system Firefox # # NOTE: This script is NOT sh compliant (echo -n), # bash needed # Timeouting commands shell script helper # http://stackoverflow.com/a/687994/315168 TIMEOUT=timeout.sh # Which FF binary we use to run the tests FIXED_FIREFOX=$HOME/ff16/firefox/firefox # It's me, Mariooo! SELF=$(readlink -f "$0") # Skype endpoint information SKYPE_CHAT_ID="1234567890" SKYPE_SHARED_SECRET="toholampi" SEVABOT_SERVER="http://yourserver.com:5000/msg/" # # Helper function to send Skype messages from help scripts. # The messages are signed with a shared seceret. # # Parameter 1 is the message # function send_skype_message() { msg="$1" md5=`echo -n "$SKYPE_CHAT_ID$msg$SKYPE_SHARED_SECRET" | md5sum` #md5sum pads a '-' to the end of the string. We need to get rid of that. for m in $md5; do break done result=`curl --silent --data-urlencode chat="$SKYPE_CHAT_ID" --data-urlencode msg="$msg" --data-urlencode md5="$m" $SEVABOT_SERVER` if [ "$result" != "OK" ] ; then echo "Error in HTTP communicating to Sevabot: $result" fi } # Tell the tests to use downgraded FF!6 # which actually works with Selenium if [ -e $FIXED_FIREFOX ] ; then FIREFOX_PATH=$FIREFOX_PATH export FIREFOX_PATH echo "Using static Firefox 16 build to run the tests" fi send_skype_message "♫ ci-loop.sh restarted at $SELF" while true do # Kill hung testing processes (it might happen) pkill -f "bin/test" # Purge existing xvfb just in case pkill Xvfb sleep 5 echo "Opening virtual X11" # Start headless X Xvfb :15 -ac -screen 0 1024x768x24 & # Tell FF to use this X server export DISPLAY=localhost:15.0 # Run one cycle of continous integration, # give it 15 minutes to finish echo "Starting test run" $TIMEOUT -t 900 continous-integration.sh result=$? if [ "$result" == "143" ] ; then echo "------- CI TIMEOUT OOPS --------" send_skype_message "⚡ Continuous integration tests timed out - check the ci-loop.sh screen for problems" fi sleep 300 done
Voila. Back to the coffee.
Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+