Tutorial :Find first unset bit in buffer (optimization)



Question:

What's the fastest/cleanest way to find the bit offset of the first unset bit in an array of arbitrary length?

Assume the prototype for your function is something like this size_t first_unset_bit(char unsigned const *buf, size_t bit_count, size_t start_bit); and that it may be called several times in quick succession on the same buffer. If you can give a better prototype, please justify.

If you use assembly any assembly, please provide a x86 sample that will run on a core2 or later. I will award the answer to the solution that provides the best combination of speed and beauty.

Update0

Here's my naive implementation. I have no idea if it's actually right yet, it's not yet used in a live system.

static size_t first_unset_bit(char unsigned const *buf, size_t bit_count, size_t start_bit)  {      for (; start_bit < bit_count; ++start_bit)      {          size_t buf_index = start_bit / CHAR_BIT;          int bit_index = start_bit % CHAR_BIT;          if (!((buf[buf_index] >> bit_index) & 1))              return start_bit;      }      return -1;  }  


Solution:1

Linux has what I imagine to be a highly tuned implementation under the name "find_first_zero_bit".


Solution:2

Often overlooked, strings.h (yes, that standard header) contains a bunch of functions: ffs, ffsl and the likes, see here for more information. At least with gcc and on x86, this compiles to the corresponding one-cycle instruction, for instance BSFL.

I would thus suggest:

  1. add a sentinel 0xFFFF at the end of your array
  2. divide bit_count by 4 (so your iterating over 4-byte blocks instead of bytes)
  3. use a while loop to find the block with the first set bit

For example:

cursor = start_pos;  while(position = ffsl(buf))    cursor++;  return (cursor - startpos) * 32 + pos;  

(Except you have to test whether youve reached the sentinel, in which case the buffer is blank.)

Although you should take this with a grain of salt because I don't purport to be an assembly expert... you'd basically use little more than 3 cycles for every 32 bits (one incrementation, one comparison, one BSFL instruction), and imagine you can do better using the long version of the function.


Solution:3

In x-86 assembly language,

REPE SCAS 0xFFFFFFFF  

...is likely to form an important part of the answer!

You have no choice but to examine every bit preceding the first unset bit, so it's just down to how quickly you can do that. Comparing 32 bits at a time is a good start, and once you get down to knowing which WORD contains the first unset bit you can use a combination of shifts/lookuptables to locate the first unset bit in that word.


Solution:4

Optimization hint: create lookup table that maps byte value to first unset bit than loop bytes but not bits.


Solution:5

Without using any assembly language, but with GCC builtins, and assuming that bit_count is a multiple of the number of bits in a long, something like this should work. I changed your function to take a void* buffer argument to avoid aliasing problems. Totally untested, I might have messed up the math especially in the leading "if (start_bit % LONG_BIT) block.

#include <stddef.h>  #include <limits.h>  #define LONG_BIT (CHAR_BIT * sizeof(unsigned long))    size_t  first_unset_bit(const void *buf, size_t bit_count, size_t start_bit)  {      size_t long_count = bit_count / LONG_BIT;      size_t start_long = start_bit / LONG_BIT;        const unsigned long *lbuf = (const unsigned long *)buf;        if (start_bit % LONG_BIT)      {          size_t offset = start_bit % LONG_BIT;          unsigned long firstword = lbuf[start_long];          firstword = ~(firstword | ~((1UL << offset) - 1));          if (firstword)              return start_bit - offset + __builtin_clzl(firstword);            start_long += 1;      }        for (size_t i = start_long; i < long_count; i++)      {          unsigned long word = lbuf[i];          if (~word)              return i*LONG_BIT + __builtin_clzl(~word);      }      return bit_count + 1; // not found  }  


Solution:6

The obvious solution is to just loop through from start_bit until you either get to the end of the array or find an unset bit.

Because it can be of arbitrary length you can't just turn it into a number and find the value that way, as it may likely be largr than the size of a double.


Solution:7

I'm assuming your buffer is aligned, e.g. a buffer returned by malloc. If not you'll need to scan the unaligned part at the beginning first.

uint32_t *p = (void *)buf;  while (!(*p+1)) p++;  size_t cnt = (unsigned char *)p - buf << CHAR_BIT;  if (*p>=0xFFFF0000)    if (*p>=0xFFFFFF00)      if (*p>=0xFFFFFFF0)        if (*p>=0xFFFFFFFC)          if (*p>=0xFFFFFFFE) cnt+=31;          else cnt+=30;        else          if (*p>=0xFFFFFFF9) cnt+=29;          else cnt+=28;      else        if (*p>=0xFFFFFFC0)          if (*p>=0xFFFFFFE0) cnt+=27;          else cnt+=26;        else          if (*p>=0xFFFFFF90) cnt+=25;          else cnt+=24;    else      ...  

I'll leave filling in the rest of the binary search to you.


Solution:8

As others have mentionned, assembly language may give the best performance. If that is not an option, you may wish to consider the following (untested) routine. It is not quite that for which you asked, but it should be close enough that you can adapt it to your needs.

size_t findFirstNonFFbyte (      unsigned char const *buf,       /* ptr to buffer in which to search */      size_t               bufSize,   /* length of the buffer */      size_t               startHint  /* hint for the starting byte (<= bufSize) */      ) {      unsigned char * pBuf = buf + startHint;      size_t          bytesLeft;        for (bytesLeft = bufSize - startHint;           bytesLeft > 0;           bytesLeft = startHint, pBuf = buf) {          while ((bytesLeft > 0) && (*pBuf == 0xff)) {              *pBuf++;              bytesLeft--;          }            if (bytesLeft > 0) {              return ((int) (pBuf - buf));          }      }      return (-1);  }  

To use ...

index = findFirstNonFFbyte (...);  bit_index = index + bitTable[buffer[index]];  

Additional Notes:

The above code will check 8 bits at a time. If you know your buffer will be 4-byte aligned and its length an even multiple of 4 bytes, then you can test 32-bits at a time with a little tweaking (don't forget the return value calculation).

If your starting bit is not a hint but an absolute, then you can skip the for-loop.

You will need to provide your own bit lookup table. It should be an array 256 bytes long. Each entry identifies the first clear bit of the byte that indexes that entry. Personal experience tells me that different people will number those bits differently. Some call bit 0 the most signficant bit of the byte; others will call bit 0 the least significant bit of the byte. Whatever style you pick, be sure to be consistent.

Hope this helps.


Solution:9

use the gcc builtin equivalent of Microsoft's _BitScanReverse, I use something like this to find the first free bit(representing block usage) for my memory system:

        __forceinline DWORD __fastcall GetNextFreeBlockIndex(PoolBlock* pPoolBlock)          {              DWORD dwIndex;              DWORD dwOffset = 0;              DWORD* pUsage = &pPoolBlock->fUsage[0];              while(dwOffset < MMANAGER_BLOCKS_PER_POOL)              {                  DWORD dwUsage = *pUsage;                  if(dwUsage != 0xFFFFFFFF && _BitScanForward(&dwIndex,~dwUsage))                  {                      #if !( MMANAGER_ATOMIC_OPS )                          pPoolBlock->pSync.Enter();                      





        
Previous
Next Post »