Bit twiddling has always mesmerized me, or rather, watching *others*
manipulate raw data (at the lowest level possible) is what was so
mesmerizing. This is mostly because it seems magical to me. Not coming
from a CS degree background I’ve always been attracted to abstractions
and libraries that shield me from dealing with system-level data and
components.

A recent project required me to take a text file full of csv records and compress each one into a six byte structure.

```
Legend:
| = byte separator
. = one bit
<----> = space consumption of numeric component
Binary structure of one record:
<--------------------><--------------------> <------>
|........|........|........|........|........|........|
```

As you can see, there are two numeric components that take up to 2.5 bytes (20 bits) of space and one 8-bit component. One example csv record looked something like this:

```
129,-42,54038
```

Each number (component) expressed in binary using the built-in python
function `bin()`

looks like this:

```
>>> bin(129)
'0b10000001'
>>> bin(-42)
'-0b101010'
>>> bin(54038)
'0b1101001100010110'
```

Now, try running this line of code in your interpreter:

```
>>> bin(42)
```

I’ll wait while you do that…

I’m still waiting.

…

Ok, did you notice how the result of `bin(-42)`

from earlier looks just
like the result for `bin(42)`

just with a leading negative sign (`-`

)?
That’s actually really unhelpful in visualizing the end result. What
would be really nice to have seen in the beginning is the number
represented in “two’s compliment binary”. For
reasons
I don’t fully
understand python differs in its visual representation from its actual
implementation for this function’s output. An optional parameter to the
bin function that specified “two’s compliment” would be awesome, but it
doesn’t exist. So, I wrote some that would show the “two’s compliment”
representation of negative numbers:

```
>>> bits(3, 8)
'00000011'
>>> bits(-3, 8)
'11111101'
```

Compare the result above with the built-in:

```
>>> bin(-3)
-0b11
```

Here’s the implementaion of my `bits()`

method, complete with
documentation:

```
def bits(number, size_in_bits):
"""
The bin() function is *REALLY* unhelpful when working with negative numbers.
It outputs the binary representation of the positive version of that number
with a '-' at the beginning. Woop-di-do. Here's how to derive the two's-
complement binary of a negative number:
complement(bin(+n - 1))
`complement` is a function that flips each bit. `+n` is the negative number
made positive.
"""
if number < 0:
return compliment(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
else:
return bin(number)[2:].rjust(size_in_bits, '0')
def compliment(value):
return ''.join(COMPLEMENT[x] for x in value)
COMPLEMENT = {'1': '0', '0': '1'}
```

So, now that I could see the bits as they really were I came up with the following description of the scenario. Take each number visualized using the available space:

```
>>> bits(129, 8)
'10000001'
>>> bits(-42, 20)
'11111111111111010110'
>>> bits(54038, 20)
'11111111111110010001'
```

No concatenate them in a 48-bit, or 6-byte structure, with
`bits(54038, 20)`

ocupying the left-most position, followed by
`bits(-42, 20)`

, then finally `bits(129, 8)`

:

```
<--------------------><--------------------> <------>
|........|........|........|........|........|........|
Expected: 11111111 11111001 00011111 11111111 11010110 10000001
```

So now, what we need is a sequence of bitwise operations that will produce the 6-byte structure from the 8-bit component and the 2 20-bit components.

Let’s start simple.

We are going to need to do some bit shifting in order to get the 20-bit components to the left of the 8-bit component, which needs to be on the far right.

So, let’s just work with `129`

(8-bits) and `-42`

(20-bits):

```
>>> bits(-42 << 8 | 129, 48)
'111111111111111111111111111111111101011010000001'
```

As you can hopefully see, the `129`

takes up the right-most 8 bits. The
`129`

takes up the rest.

Now, the tricky part is getting the 20 bits from `54038`

in there
without messing up what we’ve already done. Without any more messing
around, here’s the solution diagrammed:

Start with `bits(54038)`

:

```
>>> bits(54038) # showing prettier result for the purpose of illustration
00000000 00000000 00000000 00001111 11111111 10010001
```

Now shift left 28 bits (`<<`

):

```
>>> bits(54038 << 28, 48)
00001101 00110001 01100000 00000000 00000000 00000000
```

Now the tricky part, which makes the next step possible:

```
>>> bits(54038 << 28 ^ 268435455, 48)
00001101 00110001 01101111 11111111 11111111 11111111
```

What we just did was a “bitwise exclusive or (`^`

)” using a mask. In
this case the mask was `268435455`

. What’s so special about that number?
`bits(268435455, 21)`

shows that in binary, that number is 21 ones in a
row. Try removing the bitwise exclusive or using the mask and you’ll see
things go bonkers in a hurry.

Now, lets prepare the `-42`

component:

```
>>> bits(-42, 48)
11111111 11111111 11111111 11111111 11111111 11010110
```

Shift it over 8 to make room for the `129`

later:

```
>>> bits(-42 << 8, 48)
11111111 11111111 11111111 11111111 11010110 00000000
```

Now, add that result into what we had previously using a “bitwise and
(`&`

)”:

```
>>> bits(54038 << 28 ^ 268435455 & -42 << 8, 48)
00001101 00110001 01101111 11111111 11010110 00000000
```

Can you see both numbers in that structure? Now, add in the `129`

on the
end (or the beginning if you’re thinking in binary) using a “bitwise or
(`|`

)”:

```
>>> bits(54038 << 28 ^ 268435455 & -42 << 8 | 129, 48)
00001101 00110001 01101111 11111111 11010110 10000001
```

Phew! Now, we really don’t need the bits function anymore:

```
>>> result = 54038 << 28 ^ 268435455 & -42 << 8 | 129
```

Now it was just a matter of creating a function that accepted the 3
numeric inputs and calculated the result using the above operators and
mask. The last step was to output the raw bytes of the `result`

:

```
>>> import struct
>>> b = bytearray(struct.pack('q', result))[:6]
```

And now, one last sanity check:

```
# working left to right:
# 00001101 00110001 01101111 11111111 11010110 10000001
>>> bits(b[0], 8)
'10000001'
>>> bits(b[1], 8)
'11010110'
>>> bits(b[2], 8)
'11111111'
>>> bits(b[3], 8)
'01101111'
>>> bits(b[4], 8)
'00110001'
>>> bits(b[5], 8)
'00001101'
```

As expected, we have 6 bytes that match up with our expected bit array.
I must admit that it took quite the reading of the python bitwise
operator docs as well as
a lot of expirementation once I had created the `bits`

function to find
the right order of operations. But my understanding of bitwise
operations is much more solid as a result. I’ve posted my `bits`

function as a gist. Enjoy!

```
def bits(number, size_in_bits):
"""
The bin() function is *REALLY* unhelpful when working with negative numbers.
It outputs the binary representation of the positive version of that number
with a '-' at the beginning. Woop-di-do. Here's how to derive the two's-
complement binary of a negative number:
complement(bin(+n - 1))
`complement` is a function that flips each bit. `+n` is the negative number
made positive.
>>> bits(3, 4)
'0011'
>>> bits(129, 16)
'0000000011111111'
>>> bits(-42, 8)
'11010110'
"""
if number < 0:
return compliment(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
else:
return bin(number)[2:].rjust(size_in_bits, '0')
def compliment(value):
return ''.join(COMPLEMENT[x] for x in value)
COMPLEMENT = {'1': '0', '0': '1'}
```

*-Michael Whatcott*