Power searching using UNIX grep

Posted on 2012-05-29 by Mikko Ohtamaa

UNIX grep is a command tool for searching text strings inside files. (One should not confuse it with find which matches filenames and properties). In this blog post there are some hints how to use grep to search from files fast and efficiently.

Example how to search Plone source tree for “content-core” examples in page template files

Some notes about grep

Grep can search multiple files and directory trees
Grep can be tuned to be faster
Grep output can be friendly and colorized

As with many UNIX tools, due to legacy and backwards compatibility, grep doesn’t do these things out of the box and simply provides you an plain barebone interface.

1. Install GNU grep

GNU grep supports plenty of options, like better coloring, over BSD grep which is shipped with BSD based operating systems like OSX. You can install GNU grep from grep package of Macports. See ztanesh README for example sudo port install command.

2. Searching multiple files

Below is an example how to search case-insensitive (-i) match, recursively (-R) from a folder, only including (–include) .py files. I.e. It searches all Python files in the source tree for “foobar” word:

grep -Ri --include="*.py" foobar ~/code/mixnap/krusovice-src

3. Using colors

You can colorize things in grep output like filename, linenumber, highlighted match and lines around the result.

Below is my example for setting GREP_COLORS environment variable

GREP_OPTIONS="--color=always"
GREP_COLORS="ms=01;37:mc=01;37:sl=:cx=01;30:fn=35:ln=32:bn=32:se=36"

Note: Use GREP_COLORS, not deprecated GREP_COLOR environment variable, as the former provides much more options.

4. Search as ASCII

By default, grep will decode incoming text files in encoding set in environment variables. This will take CPU cycles. If you are searching plain ASCII match, like with programming language source code files, you can gain much speed by disabling the decoding. Override LC_CTYPE environment variable when running grep:

LC_CTYPE=POSIX grep....

This is a GNU grep bug and fixed in 2.7.

5. Show lines around the match

You can specify –before-context and –after-context options which show the text snippet around the matching line. Also –line-number is very useful switch when dealing with source code files.

6. ZSH shell search alias

This wraps it all together. We define a ZSH function search which will give us a shortcut for searching multiple files in a folder tree:

# Search ASCII-string from multiple files in the currect working directory
# E.g.
# search "foobar" "*.html"
# search "foobar" "*.html" myfolder
# By default we excluse dotted files and directoves (.git, .svn)
function search() {

        if [[ ! -n "$1" ]] ; then
                echo "Usage: search \"pattern\" \"*.filemask\" \"path\""
                return
        fi

        # Did we get path arg
        if [[ ! -n "$3" ]] ;
        then
                search_path="."
        else
                search_path="$3"
        fi

        # LC_CTYPE="posix" 20x increases performance for ASCII search
        # https://twitter.com/jlaurila/status/86750682094374912

        # We use specially tuned GREP colors - make sure you have GNU grep on OSX
        # https://github.com/miohtama/ztanesh/blob/master/README.rst

        GREP_COLORS="ms=01;37:mc=01;37:sl=:cx=01;30:fn=35:ln=32:bn=32:se=36" LC_CTYPE=POSIX \
        grep -Ri "$1" --line-number --before-context=3 --after-context=3 --color=always --include="$2" --exclude=".*" "$search_path"/*
}

This, and other ZSH goodies, are available in ztanesh package on Github.

7. Turn off OS native file indexing

If you use grep as your primary search tool I suggest you turn off your operating system search indexing operations like OSX Spotlight. These just take space and CPU cycles.

$\"\"$ Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Sync and back-up Sublime Text settings and plug-ins using Dropbox on Linux and OSX

Posted on 2012-05-24 by Mikko Ohtamaa

Dropbox is a wonderful service to sync and back-up files across multiple computers. Sublime Text 2 is what developer’s text editor should look like circa 2012.

Power users often use many computers (work, home, university, etc.). They have very optimized workflow to their specific needs. The thing with Sublime Text is that you can customize to be more than just a text editor: the editor becomes natural extensions of your fingers.

Since the customization work (settings, installing plug-ins) takes some effort you don’t want to

Redo customizations on every computer
Lose your customizations

Below is a shell script which will make Sublime Text to sync its configuration files and installed plug-ins across different computer.

Back-up your Sublime settings first – just in case
Run the script on the first computer to copy Sublime stuff to Dropbox
Run the script on other computers and it will pull in the settings from Dropbox and set-up the automatic sync

Tested on OSX and Ubuntu Linux. Please note that Sublime Text settings folder locations on Linux may depend on the Linux distribution and the installation method. For Windows users see this manual method.

The code lives in Github if you wish to contribute patches.

#!/bin/sh
#
# Set-up Sublime settings + packages sync over Dropbox
#
# Will sync settings + Installed plug-ins
#
# Tested on OSX - should support Linux too as long as
# you set-up correct SOURCE folder
#
# Copyright 2012 Mikko Ohtamaa http://opensourcehacker.com
# Licensed under WTFPL
#

# Note: If there is an existing installation in Dropbox,
# it will replace settings on a local computer

# No Warranty! Use on your own risk. Take backup of Library/Application Support/Sublime Text 2 folder first.

DROPBOX="$HOME/Dropbox"

# Where do we put Sublime settings in our Dropbox
SYNC_FOLDER="$DROPBOX/Sublime"

# Where Sublime settings have been installed
if [ `uname` = "Darwin" ];then
        SOURCE="$HOME/Library/Application Support/Sublime Text 2"
elif [ `uname` = "Linux" ];then
        SOURCE="$HOME/.config/sublime-text-2"
else
        echo "Unknown operating system"
        exit 1
fi

# Check that settings really exist on this computer
if [ ! -e "$SOURCE/Packages/" ]; then
        echo "Could not find $SOURCE/Settings/"
        exit 1
fi

# Detect that we don't try to install twice and screw up
if [ -L "$SOURCE/Packages" ] ; then
        echo "Dropbox settings already symlinked"
        exit 1
fi

# XXX: Disabled Settings/ folder syncing as looks like
# Sublime keeps only license and .sublime_session files -
# the latter
# which are autosaved and would cause unnecessary conflicts
# and traffic

# Dropbox has not been set-up on any computer before?
if [ ! -e "$SYNC_FOLDER" ] ; then
        echo "Setting up Dropbox sync folder"
        mkdir "$SYNC_FOLDER"
        cp -r "$SOURCE/Installed Packages/" "$SYNC_FOLDER"
        cp -r "$SOURCE/Packages/" "$SYNC_FOLDER"
#        cp -r "$SOURCE/Settings/" "$SYNC_FOLDER"
fi

# Now when settings are in Dropbox delete existing files
rm -rf "$SOURCE/Installed Packages"
rm -rf "$SOURCE/Packages"
#rm -rf "$SOURCE/Settings"

# Symlink settings folders from Drobox
ln -s "$SYNC_FOLDER/Installed Packages" "$SOURCE"
ln -s "$SYNC_FOLDER/Packages" "$SOURCE"
#ln -s "$SYNC_FOLDER/Settings" "$SOURCE"

$\"\"$ Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Automatically colorize terminal tabs based on the server you are logged into

Posted on 2012-05-22 by Mikko Ohtamaa

Behold:

OSX’s iTerm 2, and maybe some other terminal applications, support ANSI control sequence extensions which allow shell to set the color of the terminal tab.

Below is a Python script which

Randomizes a color based on the server host name. The same hostname always results to the same color.
The color is randomized in HSL color space, so that only the hue component varies and saturation and lightness are locked. This prevents the creation of ugly color combinations like black text on black tab background.

Note: The effect can be also applied on terminal windows – for those who don’t use tabs.

The effective result is that

You learn to identify terminal tabs by the color
You can much more faster to switch between tabs, because you can visually pick up the terminal without needing to be able to read the text on it or remember its location in the list

Note: If your puny terminal does not support setting the color of window decorations, you can always set the terminal background color. This is useful e.g. if you want to red background for danger zone ™ when you are logged in as root on the production server 23:00 Friday night.

Note: Naturally you also need to have the script installed on the servers you are ssh’ing into

1. precmd() hook

You can run the script once and the tab color is set. However, if you SSH from the computer to another and then exit back, the color of the latest server would remain in this case.

This can be avoided by

Calculating the OSC control code sequence needed to set the terminal tab color when the shell starts
Have a precmd() hook (zsh terminology, not sure what other shells use) to reset the tab color every time the shell prompt is displayd

We, me with my friend, are maintaining (yet another) zsh toolkit called ztanesh (github). There you can find precmd() example codes in 1) 98-server-color and 2) 80-statusbar.

2. rainbow-parade.py

The script code lives on Github. Currently it supports iTerm 2 only and we plan to expand support to Konsole. Patches for other terminals are welcome.

(This probably could be done in pure shell code too, but Python is just so much more fun…)

#!/usr/bin/env python
"""

       Set terminal tab / decoration color by the server name.

       Get a random colour which matches the server name and use it for the tab colour:
       the benefit is that each server gets a distinct color which you do not need
       to configure beforehand.

"""

import socket
import random
import colorsys
import sys

# http://stackoverflow.com/questions/1523427/python-what-is-the-common-header-format
__copyright__ = "Copyright 2012 Mikko Ohtamaa - http://opensourcehacker.com"
__author__ = "Mikko Ohtamaa <mikko@opensourcehacker.com>"
__licence__ = "WTFPL"
__credits__ = ["Antti Haapala"]

USAGE = """
Colorize terminal tab based on the current host name.

Usage: rainbow-parade.py [0-1.0] [0-1.0] # Lightness and saturation values

An iTerm 2 example (recolorize dark grey background and black text):

    rainbow-parade.py 0.7 0.4
"""

def get_random_by_string(s):
    """
    Get always the same 0...1 random number based on an arbitrary string
    """

    # Initialize random gen by server name hash
    random.seed(s)
    return random.random()

def decorate_terminal(color):
    """
    Set terminal tab / decoration color.

    Please note that iTerm 2 / Konsole have different control codes over this.
    Note sure what other terminals support this behavior.

    :param color: tuple of (r, g, b)
    """

    r, g, b = color

    # iTerm 2
    # http://www.iterm2.com/#/section/documentation/escape_codes"
    sys.stdout.write("\033]6;1;bg;red;brightness;%d\a" % int(r * 255))
    sys.stdout.write("\033]6;1;bg;green;brightness;%d\a" % int(g * 255))
    sys.stdout.write("\033]6;1;bg;blue;brightness;%d\a" % int(b * 255))
    sys.stdout.flush()

    # Konsole
    # TODO
    # http://meta.ath0.com/2006/05/24/unix-shell-games-with-kde/

def rainbow_unicorn(lightness, saturation):
    """
    Colorize terminal tab by your server name.

    Create a color in HSL space where lightness and saturation is locked, tune only hue by the server.

    http://games.adultswim.com/robot-unicorn-attack-twitchy-online-game.html
    """

    name = socket.gethostname()

    hue = get_random_by_string(name)

    color = colorsys.hls_to_rgb(hue, lightness, saturation)

    decorate_terminal(color)

def main():
    """
    From Toholampi with love http://www.toholampi.fi/tiedostot/119_yleisesite_englanti_naytto.pdf
    """
    if(len(sys.argv) < 3):
        sys.exit(USAGE)

    lightness = float(sys.argv[1])
    saturation = float(sys.argv[2])

    rainbow_unicorn(lightness, saturation)

if __name__ == "__main__":
    main()

$\"\"$ Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Never use hard tabs

Posted on 2012-05-13 by Mikko Ohtamaa

Update: This post has been updated to address some of the claims in the comments. Please note that most of these claims are wrong 🙂

As there seems to be some confusion when hard tab characters (ASCII code 9) are appropriate in source code files here is a rule:

1) Never use hard tabs

1. 1) Unless your source code is hard tab sensitive (only such format I know is Makefile)

1. Reasons not to use hard tabs

Due to legacy, different text editors treat hard tabs different. UNIX text editors prefer hard tab is 8 spaces, Windows text editors and IDEs (Eclipse) prefer that a hard tab is 4 spaces.
The hard tab length agreement between different text editors cannot be reached
The hard tab length agreement between people cannot be reached
Thus, hard tabs may break source code readability and editability if there is more than a single person editing the file. They will open the file in an editor with different tab settings, edit it and next time you open the file it is ruined and all indentations are wrong.
This is even worse on white space sensitive languages (Python, CoffeeScript) as this might actually cause syntax errors or programming logic errors

However, you can avoid this problem in the first place if you do indentation using soft tabs (spaces) instead.

Even if you were the single person in the world editing the text file, even you might switch the text editor in some point and accidentally shoot yourself in the leg.

2. Using soft tabs for indentation and having no hard tabs should not be a problem because

All text editors can convert tabs to spaces in fly, when editing the file. Please note me if there are commonly used editors, besides Windows Notepad, which doesn’t do it yet.
Text editors usually have different settings to tab key length and indentation settings. The latter is what you really want to adjust.

3. Pseudo-arguments for using hard tabs

It makes the file size smaller: you really care about those twenty bytes on your gigabyte hard disks?
This one I made up: spaces count toward the file size in web stuff, because visitors download the files. However if those bytes really matter you that much you should be using a minimizer in the first place.
I like them arguments…: rationale not involved
The change resistance in human nature

You might have a legacy software project having its legacy style guides. If the project big, e.g. Linux kernel, the switching cost may be very high and not affordable. However, even with this kind of codebase, you can gradually replace hard tabs away.

4. Style guides (updated)

Hard tabs are required only in Makefile syntax and preferred only in Go style guides. This is because people have learn by experience that hard tabs cause mess when working with other people.

5. Tab character is not semantically the same as indent level (updated)

There is no style guide or coding conventions saying that the tab character should the indent. This assumption is easy to make because it allows you to stick your head into a sand, ignore the surrounding world and by singing “let the users pick their own tab width” mantra. However, though a cunning idea, this perceived simplicity causes compatibility and co-operation issues which this post tries to highlight.

6. The user should choose to their own tab width (updated)

To choose your own preference is the rationale many people claim is the reason for indentation by tabs. “Let the users pick their own preference for indentation” However if you indent to work with other people the recommendation is to stick to the programming language recommendation. This way people can be more easily pick up the codebase.

There is nothing gained by having “user chooseable indentation width by adjusting tab character width”. The most used indentation width is 4 spaces anyway, so it is extra effort to maintain the freedom to have a user pickable indent width number instead.

Try to go to tell some old UNIX admin that they must adjust their editor tab width (see below).

7. Tab width is 8 spaces and don’t mess with it (updated)

People who use tabs as indent assume tab one tab = one indent and they can freely adjust the tab width, so that the indentation looks nice. However

By legacy, the tab character width is 8 spaces and most of the software out there makes this assumption. If you use any other value for tab width you are breaking this legacy social contract and you are making it for other peoples more difficult to work with your project.

This really limits the ability of using tabs as indent, because 8 spaces tabs often don’t make sense as indent (exception in system style C programming like with Linux kernel).

It’s ok if you work with the code in only one text editor alone where you have this one setting for tab width and that’s the only setting in the world controlling the tab width in your source code project. But when someone else must read or edit the code

Others must adjust tab width to make your code more readable
Adjusting tab widths from 8 spaces might be very difficult when working with other toolchains (e.g. you are viewing the code in terminal using cat)

Also in the languages where the recommended indentation level is only two spaces it would make it little funny to use one tab character just to have two spaces.

8. Tabs cannot be used to format code multi-line functions (updated)

Some people prefer to format long argument lists like this. A Python example which would break because of mixed tabs and spaces if tabs are used as indentation

call_a_function(argument,
                very_long_argument="something",
                even_longer_argument="something_else)

Or in C

   if (first condition
       second condition)

9. Text editors

If you are a text editor author, make sure your text editor ships with hard tabs turned off by default, especially for whitespace sensitive languages if you vary tab policy by file type.

Note that this blog post, and the situation, could have been avoided if

All text editors would have sticked to soft tabs by default
All text editors would have sticked to a hard tab is 8 spaces by default

But in some point (when?) someone (who?) decided to make our life little more complex.

10. Tools for managing hard tab policy in your software project

$\"\"$ Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Open Source Hacker

Pushing the boundaries of free technology

Monthly Archives: 2012-5

Power searching using UNIX grep

1. Install GNU grep

2. Searching multiple files

3. Using colors

4. Search as ASCII

5. Show lines around the match

6. ZSH shell search alias

7. Turn off OS native file indexing

Sync and back-up Sublime Text settings and plug-ins using Dropbox on Linux and OSX

Automatically colorize terminal tabs based on the server you are logged into

1. precmd() hook

2. rainbow-parade.py

Never use hard tabs

1. Reasons not to use hard tabs

2. Using soft tabs for indentation and having no hard tabs should not be a problem because

3. Pseudo-arguments for using hard tabs

4. Style guides (updated)

5. Tab character is not semantically the same as indent level (updated)

6. The user should choose to their own tab width (updated)

7. Tab width is 8 spaces and don’t mess with it (updated)

8. Tabs cannot be used to format code multi-line functions (updated)

9. Text editors

10. Tools for managing hard tab policy in your software project