Bit twiddling has always mesmerized me, or rather, watching others manipulate raw data (at the lowest level possible) is what was so mesmerizing. This is mostly because it seems magical to me. Not coming from a CS degree background I’ve always been attracted to abstractions and libraries that shield me from dealing with system-level data and components.
A recent project required me to take a text file full of csv records and compress each one into a six byte structure.
Legend:
| = byte separator
. = one bit
<----> = space consumption of numeric component
Binary structure of one record:
<--------------------><--------------------> <------>
|........|........|........|........|........|........|
As you can see, there are two numeric components that take up to 2.5 bytes (20 bits) of space and one 8-bit component. One example csv record looked something like this:
129,-42,54038
Each number (component) expressed in binary using the built-in python
function bin()
looks like this:
>>> bin(129)
'0b10000001'
>>> bin(-42)
'-0b101010'
>>> bin(54038)
'0b1101001100010110'
Now, try running this line of code in your interpreter:
>>> bin(42)
I’ll wait while you do that…
I’m still waiting.
…
Ok, did you notice how the result of bin(-42)
from earlier looks just
like the result for bin(42)
just with a leading negative sign (-
)?
That’s actually really unhelpful in visualizing the end result. What
would be really nice to have seen in the beginning is the number
represented in “two’s compliment binary”. For
reasons
I don’t fully
understand python differs in its visual representation from its actual
implementation for this function’s output. An optional parameter to the
bin function that specified “two’s compliment” would be awesome, but it
doesn’t exist. So, I wrote some that would show the “two’s compliment”
representation of negative numbers:
>>> bits(3, 8)
'00000011'
>>> bits(-3, 8)
'11111101'
Compare the result above with the built-in:
>>> bin(-3)
-0b11
Here’s the implementaion of my bits()
method, complete with
documentation:
def bits(number, size_in_bits):
"""
The bin() function is *REALLY* unhelpful when working with negative numbers.
It outputs the binary representation of the positive version of that number
with a '-' at the beginning. Woop-di-do. Here's how to derive the two's-
complement binary of a negative number:
complement(bin(+n - 1))
`complement` is a function that flips each bit. `+n` is the negative number
made positive.
"""
if number < 0:
return compliment(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
else:
return bin(number)[2:].rjust(size_in_bits, '0')
def compliment(value):
return ''.join(COMPLEMENT[x] for x in value)
COMPLEMENT = {'1': '0', '0': '1'}
So, now that I could see the bits as they really were I came up with the following description of the scenario. Take each number visualized using the available space:
>>> bits(129, 8)
'10000001'
>>> bits(-42, 20)
'11111111111111010110'
>>> bits(54038, 20)
'11111111111110010001'
No concatenate them in a 48-bit, or 6-byte structure, with
bits(54038, 20)
ocupying the left-most position, followed by
bits(-42, 20)
, then finally bits(129, 8)
:
<--------------------><--------------------> <------>
|........|........|........|........|........|........|
Expected: 11111111 11111001 00011111 11111111 11010110 10000001
So now, what we need is a sequence of bitwise operations that will produce the 6-byte structure from the 8-bit component and the 2 20-bit components.
Let’s start simple.
We are going to need to do some bit shifting in order to get the 20-bit components to the left of the 8-bit component, which needs to be on the far right.
So, let’s just work with 129
(8-bits) and -42
(20-bits):
>>> bits(-42 << 8 | 129, 48)
'111111111111111111111111111111111101011010000001'
As you can hopefully see, the 129
takes up the right-most 8 bits. The
129
takes up the rest.
Now, the tricky part is getting the 20 bits from 54038
in there
without messing up what we’ve already done. Without any more messing
around, here’s the solution diagrammed:
Start with bits(54038)
:
>>> bits(54038) # showing prettier result for the purpose of illustration
00000000 00000000 00000000 00001111 11111111 10010001
Now shift left 28 bits (<<
):
>>> bits(54038 << 28, 48)
00001101 00110001 01100000 00000000 00000000 00000000
Now the tricky part, which makes the next step possible:
>>> bits(54038 << 28 ^ 268435455, 48)
00001101 00110001 01101111 11111111 11111111 11111111
What we just did was a “bitwise exclusive or (^
)” using a mask. In
this case the mask was 268435455
. What’s so special about that number?
bits(268435455, 21)
shows that in binary, that number is 21 ones in a
row. Try removing the bitwise exclusive or using the mask and you’ll see
things go bonkers in a hurry.
Now, lets prepare the -42
component:
>>> bits(-42, 48)
11111111 11111111 11111111 11111111 11111111 11010110
Shift it over 8 to make room for the 129
later:
>>> bits(-42 << 8, 48)
11111111 11111111 11111111 11111111 11010110 00000000
Now, add that result into what we had previously using a “bitwise and
(&
)”:
>>> bits(54038 << 28 ^ 268435455 & -42 << 8, 48)
00001101 00110001 01101111 11111111 11010110 00000000
Can you see both numbers in that structure? Now, add in the 129
on the
end (or the beginning if you’re thinking in binary) using a “bitwise or
(|
)”:
>>> bits(54038 << 28 ^ 268435455 & -42 << 8 | 129, 48)
00001101 00110001 01101111 11111111 11010110 10000001
Phew! Now, we really don’t need the bits function anymore:
>>> result = 54038 << 28 ^ 268435455 & -42 << 8 | 129
Now it was just a matter of creating a function that accepted the 3
numeric inputs and calculated the result using the above operators and
mask. The last step was to output the raw bytes of the result
:
>>> import struct
>>> b = bytearray(struct.pack('q', result))[:6]
And now, one last sanity check:
# working left to right:
# 00001101 00110001 01101111 11111111 11010110 10000001
>>> bits(b[0], 8)
'10000001'
>>> bits(b[1], 8)
'11010110'
>>> bits(b[2], 8)
'11111111'
>>> bits(b[3], 8)
'01101111'
>>> bits(b[4], 8)
'00110001'
>>> bits(b[5], 8)
'00001101'
As expected, we have 6 bytes that match up with our expected bit array.
I must admit that it took quite the reading of the python bitwise
operator docs as well as
a lot of expirementation once I had created the bits
function to find
the right order of operations. But my understanding of bitwise
operations is much more solid as a result. I’ve posted my bits
function as a gist. Enjoy!
def bits(number, size_in_bits):
"""
The bin() function is *REALLY* unhelpful when working with negative numbers.
It outputs the binary representation of the positive version of that number
with a '-' at the beginning. Woop-di-do. Here's how to derive the two's-
complement binary of a negative number:
complement(bin(+n - 1))
`complement` is a function that flips each bit. `+n` is the negative number
made positive.
>>> bits(3, 4)
'0011'
>>> bits(129, 16)
'0000000011111111'
>>> bits(-42, 8)
'11010110'
"""
if number < 0:
return compliment(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
else:
return bin(number)[2:].rjust(size_in_bits, '0')
def compliment(value):
return ''.join(COMPLEMENT[x] for x in value)
COMPLEMENT = {'1': '0', '0': '1'}