BitReader – Python module for reading bits from bytes

I worked on a project that involved working with MPEG Transport Stream and Digi-TV(DVB). EPGReader to be exact.

The MPEG TS is a binary format, where multiple fields can be defined within a single byte. I could not use Python’s struct module because it only works with bytes or larger and the fields I had were just a couple of bits. First I started with regular bitshifts and bitmasks, but I soon realised it was very error prone task for me. It was very easy to make mistakes and the code was not very readable either.

For example, first 3 bytes of MPEG TS header contains a SYNC_BYTE, which is 1 byte in size and always has the value 0x47. This byte is used to detect packets from the stream. Each packet is 188 bytes long. The next bit is “transport error indicator”, which is set by receiver hardware to flag errors in demulation( analog signal to bits ), next bit is “payload unit start indicator” indicating that the current packet starts a new payload of data, then comes “transport priority” bit and finally “packet id”. Normally I’d write code something like following:

data      = read(3)          # Get 3 bytes of data
sync_byte = data >> 16       # Get bits 24-16
tei       = data >> 15 & 0x1 # 16. bit
payl_start= data >> 14 & 0x1 # 15. bit
tp        = data >> 13 & 0x1 # 14. bit
pid       = data & 0x1FFF    # Get last 13 bits

On top of that I needed to store the values in dictionary. As you can see this is not very readable nor convenient. So I figured there must be something easier.

1. Meet BitReader

spec = (
    # Name of the data to read
    'sync_byte',
    # How many bits to read( 8 bits = 1 byte )
    8,
    'tei',
    1,
    'payl_start',
    1,
    'tp',
    1,
    'pid',
    13
)

reader = BitReader(spec)
data   = reader.read(read(3))
assert data.sync_byte == 0x47

And if and when one needed to add one more byte and couple of variables, that’s when the code starts to break with bitshifts. Any change to the original data size requires you to change the bitshifts accordingly. Also adding new values to the middle requires changes to bitshifts, in case you missed it in the spec the first time etc.

data      = read(4)          # Get 4 bytes of data
sync_byte = data >> 24       # Get bits 32-24
tei       = data >> 23 & 0x1 # 24. bit
etc... not very interested on getting this right, but you'll get the idea

When using BitReader, I just give the variable name and how many bits it takes. Simple as that. No need to touch the other variables.

spec = (
    # Name of the data to read
    'sync_byte',
    # How many bits to read
    8,
    'tei',
    1,
    'payl_start',
    1,
    'tp',
    1,
    'pid',
    13,
    'scrambling',
    2,
    'has_adapt',
    1
    'has_payload',
    1,
    'continuity',
    4
)

reader = BitReader( spec )
data   = reader.read(read(4))

And it doesn’t matter if the new values are added to the beginning, middle or at the end.

2. About performance & syntax

BitReader is a bit slower than using bitshifts, but it was still easily fast enough for the task I worked on. And if compiled using Cython the performance nearly doubles without any code change.

Is it faster? – Performance, no, but you are faster. Have a cup of C if you want speed. Is it more readable? – Yes. Makes life easier? – You bet!

Somebody might look at the specification syntax and quickly note that I could have used dictionary instead. Unfortunately it is not possible because the order is needed and dictionary does not preserve it.

And what about using 2-tuples ( variable, bits )? Is it more readable and less error prone? Not sure, maybe, but I thought I’ll save myself from typing parenthesis 🙂

The specification syntax was inspired by domgen… or the other way around. Can’t remember which came first.

You can also convert the data back into binary format. The read returns a BitData object, which implements ‘dump()’ method, which returns an array.array(‘B’) containing the bytes. You can change the attributes of the BitData and then dump the data back into array and easily write it to a file using array.tofile(f) or send it to network.

A Javascript port might be interesting for web apps… Especially mobile apps, which often have slow connections. And a hand made C module would probably be at least as fast as the bitshifts on Python.

3. Project location

Get the code from bitbucket.

8 thoughts on “BitReader – Python module for reading bits from bytes”

Yaniv Aknin on 2010-09-09 at 02:44 said:

Perhaps you’d also be interested in Construct, an incredible general purpose binary parsing library: http://construct.wikispaces.com/
Gordon McGregor on 2010-09-10 at 10:04 said:

Is this along the same lines as what hachoir does ? http://bitbucket.org/haypo/hachoir/wiki/Home

MPEG_TS is already supported by the parser (at least to some level)
Jussi Toivola on 2010-09-10 at 18:58 said:

BitReader is a very low level tool for extracting bitdata and tries to keep it simple(json data specification). Construct seems to have similar goal but with more features and more complex api(data classes). Hachoir seems to do that and a lot more, but they should have a tutorial on how to get started( couldn’t find one ).

Interesting projects though. I wish I had known about them sooner 🙂
Mike on 2010-11-18 at 21:22 said:

If I have 10 bytes (in a long?) parsed using BitReader, how do I convert that into a string? Thanks

If you know the location of the string and its length, just read the data from the position and convert the bytes into string type with plain python. BitReader can help you to read the size of the string if it is, for example, located just before the string itself in the data. Like so:

from bitreader import BitReader
import array
data = array.array("B")

# Writing data
t = "Hello"
data.append(len(t))     # Set the size
data.extend(map(ord,t)) # Convert string to bytes

# Reading data
r = BitReader(["size", 8]) # We have the size in 8 bit value
d = r.read(data)
r = data[1:1+d.size]       # We know the data starts after the size
r = "".join(map(chr,r))    # The result is byte array, convert to python string
print "Result is %s" % r

And here is how to convert long to string:

import array

longval = 0x48656c6c6f20776f726c6421
data = array.array("B")

while longval != 0:
  b = longval & 0xFF
  data.append(b)
  longval = longval >> 8
data.reverse()

print "".join(map(chr,data))

Amanjot Singh on 2011-06-06 at 12:34 said:

hello Kind people
Can anybody help me iam trying to read any file (data) whether it is video audio,text,.exe,html or any else. But the Problem is that i want the Bit level reading means Ijust want Bits as 1s and 0s through any language
can i do it if yes then please tell me
i will be very thankfull to you for this kindness.

my email id is

amanbheley.pcte@gmail.com and
jotbox@rediffmail.com

thanx
Mikko Ohtamaa on 2011-06-06 at 13:57 said:

Thank you for your feedback. We are not offering consulting services or programming training for free.

Open Source Hacker

Pushing the boundaries of free technology

BitReader – Python module for reading bits from bytes

1. Meet BitReader

2. About performance & syntax

3. Project location

8 thoughts on “BitReader – Python module for reading bits from bytes”

Leave a Reply Cancel reply