8 bools in 1 byte, the 1 bit boolean.

James William Fletcher
4 min readApr 10, 2022

--

image credit to kafasi

An age-old question, one I would imagine many new aspiring low-level to the metal programmers may ask and one I too once asked myself.

It’s a question many may ask when they discover that in C or C++ a Boolean variable or ‘bool’ for short, is simply an alias for a 4-byte unsigned integer (actually 1-byte oops). Shock horror! Any self-respecting novice in the field might ask “but we can fit eight booleans in just one byte! why are we wasting FOUR bytes just for ONE boolean?!” and of course, the tutor would respond with some general and well-rehearsed spiel about cache alignment in memory, etc.

It is possible in C or C++ to create a set of eight 1 bit booleans in a single byte of memory, but it’s not advised so, really these days bitwise operators are somewhat a dinosaur of modern programming, that doesn’t mean you should never use them, it’s just not likely you will ever need to (with the exception of cryptography, maybe branch evasion, etc), and well, like the goto, they still have relevant purposes, it’s just not recommended in most instances. Why though? Well because generally if there is something you can do faster using bitwise operators then the compiler will detect that and do it for you, and given the appropriate optimisation flags this becomes even more likely.

But.. it still begs the question, how would one best implement such a method, and, which implementation would be the best, and why? I also wonder, how much slower would this be than actually using a 4-byte integer per boolean?

Well, finding implementations is the easy part, and like with many other questions in this day of age stack overflow already has the answer. Let’s refer to this particular thread; https://stackoverflow.com/a/8461163

We can see that a user named rodrigo has provided two very good examples of an 8 boolean byte implementation, and, for a psyduck he doesn’t seem too confused by the matter either.

May I introduce solution 1;

And solution 2;

Very beautiful indeed.

So a little bit of keyboard magic and we have this benchmarking program for both implementations;

Oh boy, that’s big, and that’s because we’re also comparing the two solutions provided by psyduck against 1-byte, 2-byte, and the traditional 4-byte boolean types (signed and unsigned!!) which yes basically just boils down to benchmarking store and load operations for that specific variable type against the bitwise operator speed of setting and getting bits from a byte. Fun stuff.

As you can see in the header of the source file I included the benchmark as run on my personal computer;

:: Solution 1  :: 5,271,206 μs , 19,466,401,076 Cycles
:: Solution 2 :: 3,465,682 μs , 12,798,653,905 Cycles
:: 1 byte bool :: 119,495 μs , 441,289,454 Cycles
:: 2 byte bool :: 113,160 μs , 417,893,614 Cycles
:: 4 byte bool :: 102,384 μs , 378,097,709 Cycles
:: 1 byte sign :: 116,160 μs , 428,975,779 Cycles
:: 2 byte sign :: 107,785 μs , 398,044,927 Cycles
:: 4 byte sign :: 369,225 μs , 1,363,532,288 Cycles

As usual I used two timers, the classic rdtsc and a microsecond timer provided by Linux, anyone who tells you that rdtsc is inaccurate or needs to be executed using rdtscp is talking guff, you can reliably time large iterations of a code segment with it and be just fine. As you can well see. If not, I dare say you would notice otherwise.

So what do we see here, well, solution 2 is faster than solution 1. Why is this? Well, I will show you the breakdown of the compiled assembly opcodes in just a moment but for nothing more than a morbid interest because really it just boils down to that the compiler can better optimise the union, and I really did try to force the compiler into inlining the toByte() and fromByte() functions or to unroll the loops but it was a futile fight.

As for the other cache aligned and cache unaligned but most probably forcibly aligned byte solutions, well they speak for themselves, they’re all about the same but the 4-byte unsigned integer is always just that little bit faster.

Here’s the ASM for solution 1 and solution 2 for anyone with that particular morbid interest.

I really liked this use of bitwise operators by Jm15itch for trophy states in the game TuxPusher.

I hope that puts this age-old mystery to rest for all new aspiring programmers alike out there, and god bless Psyduck for making our job that whole lot easier.

--

--