The MPEG TS is a binary format, where multiple fields can be defined within a single byte. I could not use Python’s struct module because it only works with bytes or larger and the fields I had were just a couple of bits. First I started with regular bitshifts and bitmasks, but I soon realised it was very error prone task for me. It was very easy to make mistakes and the code was not very readable either.
For example, first 3 bytes of MPEG TS header contains a SYNC_BYTE, which is 1 byte in size and always has the value 0x47. This byte is used to detect packets from the stream. Each packet is 188 bytes long. The next bit is “transport error indicator”, which is set by receiver hardware to flag errors in demulation( analog signal to bits ), next bit is “payload unit start indicator” indicating that the current packet starts a new payload of data, then comes “transport priority” bit and finally “packet id”. Normally I’d write code something like following:
data = read(3) # Get 3 bytes of data sync_byte = data >> 16 # Get bits 24-16 tei = data >> 15 & 0x1 # 16. bit payl_start= data >> 14 & 0x1 # 15. bit tp = data >> 13 & 0x1 # 14. bit pid = data & 0x1FFF # Get last 13 bits
On top of that I needed to store the values in dictionary. As you can see this is not very readable nor convenient. So I figured there must be something easier.
1. Meet BitReader
spec = ( # Name of the data to read 'sync_byte', # How many bits to read( 8 bits = 1 byte ) 8, 'tei', 1, 'payl_start', 1, 'tp', 1, 'pid', 13 ) reader = BitReader(spec) data = reader.read(read(3)) assert data.sync_byte == 0x47
And if and when one needed to add one more byte and couple of variables, that’s when the code starts to break with bitshifts. Any change to the original data size requires you to change the bitshifts accordingly. Also adding new values to the middle requires changes to bitshifts, in case you missed it in the spec the first time etc.
data = read(4) # Get 4 bytes of data sync_byte = data >> 24 # Get bits 32-24 tei = data >> 23 & 0x1 # 24. bit etc... not very interested on getting this right, but you'll get the idea
When using BitReader, I just give the variable name and how many bits it takes. Simple as that. No need to touch the other variables.
spec = ( # Name of the data to read 'sync_byte', # How many bits to read 8, 'tei', 1, 'payl_start', 1, 'tp', 1, 'pid', 13, 'scrambling', 2, 'has_adapt', 1 'has_payload', 1, 'continuity', 4 ) reader = BitReader( spec ) data = reader.read(read(4))
And it doesn’t matter if the new values are added to the beginning, middle or at the end.
2. About performance & syntax
BitReader is a bit slower than using bitshifts, but it was still easily fast enough for the task I worked on. And if compiled using Cython the performance nearly doubles without any code change.
Is it faster? – Performance, no, but you are faster. Have a cup of C if you want speed. Is it more readable? – Yes. Makes life easier? – You bet!
Somebody might look at the specification syntax and quickly note that I could have used dictionary instead. Unfortunately it is not possible because the order is needed and dictionary does not preserve it.
And what about using 2-tuples ( variable, bits )? Is it more readable and less error prone? Not sure, maybe, but I thought I’ll save myself from typing parenthesis 🙂
The specification syntax was inspired by domgen… or the other way around. Can’t remember which came first.
You can also convert the data back into binary format. The read returns a BitData object, which implements ‘dump()’ method, which returns an array.array(‘B’) containing the bytes. You can change the attributes of the BitData and then dump the data back into array and easily write it to a file using array.tofile(f) or send it to network.
3. Project location
Get the code from bitbucket.