Rounding up to the next power of two in C
I was writing a memory allocator in C.
My approach was to pad each allocation to a power of two number of bytes
so that the allocations could be cleanly packed together.
For example, if the program does malloc(49)
,
requesting a 49-byte memory allocation,
we round this up to 64 bytes.
For this,
I wanted a function next_pow2
so that next_pow2(49) = 64
,
next_pow2(64) = 64
,
next_pow2(65) = 128
,
et cetera.
How do we implement next_pow2
?
A natural implementation is to check all the powers of two,
stopping at the first one which is big enough:
uint64_t next_pow2(uint64_t x) {
uint64_t p = 1;
while (p < x) p *= 2;
return p;
}
The above algorithm uses ordinary arithmetic.
We can do better if we work with the binary representation of numbers in C.
In binary representation, adding a 0
doubles the number,
just as adding a 0
in decimal multiplies by ten.
So p * 2
can be done as p << 1
,
using the bitwise left shift operator <<
to shifts the number up by one and add a rightmost 0
:
uint64_t next_pow2(uint64_t x) {
uint64_t p = 1;
while (p < x) p <<= 1;
return p;
}
For a 64-bit number,
the while
loop can still take up to 64 iterations.
The number of operations is linear in the number of bits.
We can do better and make this a logarithmic number of operations.
One logarithmic-time algorithm is
to use binary search instead of linear search:
uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; }
uint64_t next_pow2(uint64_t x) {
uint8_t lo = 0, hi = 63;
while (lo < hi) {
uint8_t test = (lo+hi)/2;
if (x < pow2(test)) { hi = test; }
else if (pow2(test) < x) { lo = test+1; }
else { return pow2(test); }
}
return pow2(lo);
}
The above logarithmic-time approach has a lot of branching.
There is another logarithmic-time algorithm which uses no branching,
which I originally found here.
It relies on a function next_pow2m1
,
which finds the next number of the form 2^n - 1
:
uint64_t next_pow2(uint64_t x) {
return next_pow2m1(x-1)+1;
}
To implement next_pow2m1
, let’s see an example:
next_pow2m1(0b00010100) = 0b00011111
.
Notice that in binary representation,
this is equivalent to “fill with 1
s everything after the first 1
”.
We can do that with iterated bitwise or:
uint64_t next_pow2m1(uint64_t x) {
for (int i = 0; i < 63; i++) x |= x>>1;
return x;
}
We can make this branchless by expanding the loop:
uint64_t next_pow2m1(uint64_t x) {
x |= x>>1; // Iteration 1
x |= x>>1; // Iteration 2
...
x |= x>>1; // Iteration 63
return x;
}
Here’s what happens in each iteration:
x = 0010000000000000
x = 0011000000000000
x = 0011100000000000
x = 0011110000000000
x = 0011111000000000
x = 0011111100000000
x = 0011111110000000
x = 0011111111000000
x = 0011111111100000
x = 0011111111110000
x = 0011111111111000
x = 0011111111111100
x = 0011111111111110
x = 0011111111111111
This is still linear-time.
But we can do this faster by reusing the 1
bits from previous iterations!:
uint64_t next_pow2m1(uint64_t x) {
x |= x>>1;
x |= x>>2;
x |= x>>4;
x |= x>>8;
x |= x>>16;
x |= x>>32;
return x;
}
Now see how the rightmost bit doubles in size each time:
x = 0010000000000000
x = 0011000000000000
x = 0011110000000000
x = 0011111111000000
x = 0011111111111111
This is the most efficient “portable C” implementation of next_pow2
I’ve seen.
However, there’s a still more efficient constant-time implementation.
The above algorithm effectively finds the rightmost 1
bit,
but there’s a constant-time function for that:
__builtin_clzl
, “count leading zeros”.
This is provided by GCC (not portable C).
It does have a gotcha:
__builtin_clzl(0)
is undefined,
so we need to special-case for it.
Here’s our final constant-time implementation of next_pow2
:
uint64_t next_pow2(uint64_t x) {
return x == 1 ? 1 : 1<<(64-__builtin_clzl(x-1));
}
But how is __builtin_clzl
implemented?
We can see by compiling to assembly:
$ cc -O0 -std=c99 -S next_pow2.c
We get these instructions:
movq $42, -8(%rbp)
bsrq -8(%rbp), %rax
The bsrq
instruction is “Bit Scan Reverse”.
As usual, there is literally no official documentation on this instruction.
Tagged #programming, #c. All content copyright James Fisher 2018. This post is not associated with my employer.