share
Stack OverflowFind the largest binary gap in a number?
[+20] [10] TK Kocheran
[2011-06-15 04:00:17]
[ python algorithm math binary ]
[ http://stackoverflow.com/questions/6352851/find-the-largest-binary-gap-in-a-number ] [DELETED]

I had to solve a problem earlier on today which was very interesting: finding the largest binary gap in a number. (The longest gap between two 1s in a binary stream) For instance, 9 has a max binary gap of 2, since 9 in binary is 1001, 529 has a max binary gap of 4, since 529 in binary is 1000010001.

The way I solved it was probably not the best nor a mathematical solution, I simply converted the int in Python to a binary string via bin(n)[2:], then found the index of all matches of 1 in the string, then looped through each and counted the difference between each index and returned the greatest result.

There has to be a better, more mathological (yes, I just made that up) solution for this. I'm terrible with math, which is why I reverted to using strings... does anyone have a purely mathematical/extremely performant solution to the task at hand? I'd like to learn from the pros :)

(5) a bitwise & operation might have been a little more efficient, but yours sounds clean and readable, and if performance was sufficient, it might have been "best". - jcomeau_ictx
What is the expected output for a number with no 1's, or a single 1? - Wallacoloo
As always, take a look at OEIS: oeis.org/A087117 - belisarius
I think that the summary is that a number-based solution is possible, but a string-based solution is both faster and smaller. - Chris Morgan
(2) Why do you think there should be a "more mathological" solution? This is a question that depends on the representation of a number in a particular base, not a base-independent property of the number. There's nothing particularly mathematical in the question. - ShreevatsaR
@ShreevatsaR - esp given that the base involved is two - that's exactly the kind of situation when a surprising discrete math theorem sneaks up on you. None of those add-up-the-digits divisibility things are base-independent properties of the numbers, for instance. - Steve314
[+4] [2011-06-15 06:11:11] Josh Caswell

Looks like another job for groupby [1].

>>> from itertools import groupby
>>> n = 529
>>> max(sum(1 for i in g) for k,g in groupby(bin(n)[2:]) if k=='0')
4

Perhaps not the fastest, but reasonably so:*

% python -m timeit -s "from itertools import groupby; n = 10" "max(sum(1 for i in g) for k,g in groupby(bin(n)[2:]) if k=='0')"
100000 loops, best of 3: 5.46 usec per loop
% python -m timeit -s "from itertools import groupby; n = 10**100" "max(sum(1 for i in g) for k,g in groupby(bin(n)[2:]) if k=='0')"
10000 loops, best of 3: 128 usec per loop

On a MacBook Pro, 2.5 GHz Core i5, Python 2.6.5 running in 32-bit mode. (Doubling the bits seems to knock about 25% off the time.)

*Actually JBernardo's solution seems to roll right over this one for large numbers.

[1] http://docs.python.org/library/itertools.html#itertools.groupby

This is slower, but +1 for testing it. - ShreevatsaR
1
[+3] [2011-06-15 04:20:56] Petar Ivanov

No need to convert to binary string - you can test each bit using bit operators like | and &. I have written down the solution where you iterate once starting from the lowest bit keeping the max gap in the variable m and the current running gap in the variable cur:

def maxGap(n):
    cur = 0
    m = 0
    p = 1
    while (p <= n and (n & p) == 0): p <<= 1
    while (p <= n):
        if (n & p != 0):
            m = max(cur, m)
            cur = 0
        else:
            cur+=1
        p <<= 1
    return m

(2) but that is probably much slower than converting to string. Also it is not even readable - JBernardo
(1) Are you kidding me!? How is that not readable? And I don't think it's faster to convert and then split and then take max... - Petar Ivanov
(3) This is faster than JBernado's answer for n in (0, 1, 2), but for large input this is very definitely slower. For 529, yours is over twice as slow, and for 0xe7f6 it's 2.8 times slower, and for a really large number like 0xe7f683c0246322 it's a shade under 4 times as slow. (These done with timeit.timeit on a 64-bit Python 2.7 on Ubuntu.) - Chris Morgan
Yeah, you are right - I checked it too. - Petar Ivanov
You can make this a few percent faster by optimising the (n & p) == 0 to n & p and (n & p != 0) to n & p; make it over 1.5 times as fast (for large inputs) by changing m = max(cur, m) to if cur > m: m = cur. That still doesn't make it as fast as JBernado's answer. - Chris Morgan
One notion that I have of why this could be so much slower than the string version is that integers are immutable and so many new objects are being created/loaded/worked on. - Chris Morgan
(3) This solution is only slower than building lists of strings because it's interpreted, while Python's string and list functions are optimised C code. In any compiled language, this approach will be faster by a small constant factor (at least -- that's assuming optimistically that allocating strings and lists is O(1)). - j_random_hacker
Although I gave +1, I was thinking from the start "this feels more like C than Python". For me, that's the core of any performance and readability issues here. - Steve314
(1) @j_random_hacker - Python is interpreted, in a virtual-machine kind of way. The virtual machine has an overhead, of course, but a bigger issue may be that integers aren't just values in a CPU register (dynamic typing, large integers). - Steve314
@j_random_hacker - oops - meant to say "Python is compiled ...". - Steve314
@Steve314: You may be right about big integers being the main reason, I don't know Python well enough to say. Regarding "compilation" of Python, things like JITting blur the line, but I think Python's level of processing is closer to storing the lexed source code in a more compact representation to avoid having to re-lex it every time a function is called. Perl and even GW-BASIC do this. Don't have Python handy, but I'm 100% certain that fiver's maxGap will run at least 3x slower (maybe 10x) than an equivalent C function (which would look almost identical, incidentally). - j_random_hacker
2
[+3] [2011-06-15 04:44:50] JBernardo [ACCEPTED]

Another string based solution

def max_gap(x):
    return len(max(bin(x)[2:].rstrip('0').split('1')))

for python2.6+ you can use format(x, 'b') instead of bin(x)[2:] for readability


... but bin(x)[2:] is unsurprisingly about twice as fast as format(x, 'b'). - Chris Morgan
3
[+2] [2011-06-15 04:50:33] jkraybill

On the face of it, this problem seems similar to normal bit counting, which has been very heavily analyzed. All of the solutions presented so far would be terribly slow by cyrptographers' standards (heavy bit count users). I'd guess that if you wanted the fastest easily implemented software solution, you'd want to do something with pre-computed tables. Look at http://gurmeet.net/puzzles/fast-bit-counting-routines/ for good code examples of approaches to bit counting (http://graphics.stanford.edu/~seander/bithacks.html is an interesting page too but may not help you with this particular problem). For consecutive bit counting, I think you'd want something like three tables - a "left zero count", "middle zero count" and "right zero count" table. Then precompute that for the biggest number size you can stomach (e.g. 16 bits) and chomp your inputs that number of bits at a time, tracking the largest consecutive block of zeroes you've found.


Counting the set bits efficiently (without a hardware instruction) is interesting. In principle it's divide-and-conquer, dividing the problem into two equal parts at each stage. That looks O(n log n) in the number of bits, but you can eliminate the n factor by exploiting the parallelism in register operations - two 16-bit operations can be handled as one 32-bit operation, four 8-bit operations can be handled as one 32-bit operation, and so on. The trouble with this gap problem is that gaps can span the divide - you don't have two independent subproblems. Not sure if this method applies. - Steve314
I don't think you're referring to this approach specifically anyway - it's just a vaguely relevant thought to what you're describing. - Steve314
4
[+1] [2011-06-15 04:20:05] Steve314

Yours is a perfectly good solution IMO. There's little you can do to improve the performance, in asymptotic terms at least. Probably not worth trying to optimise unless you're working in C or C++, and even then only if you need to optimise.

However, as a slightly different (no doubt slower, but perhaps simpler) solution...

>>> "10010001101".split ("1")
['', '00', '000', '', '0', '']

Use a list comprehension to map from those strings of zeros to the lengths, then extract the maximum.

EDIT As pointed out by fiver, this doesn't work if your number is even (the binary ends with one or more zeros). To fix the bug, take a look at the string.strip method.

I should probably update this answer to include more ideas from the comments - for the moment, just be aware that the comments are worth reading.


(3) this won't work for "1001000" - it will produce 3, while the right answer is 2 - Petar Ivanov
@fiver - let's call that an exercise for the reader ;-) Clue - take a look at the strip method. - Steve314
(1) My Python's a bit rusty, but I believe something like max(map(len, "100100001101".split("1")[1:-2])) would work - BlueRaja - Danny Pflughoeft
(1) @fiver, @Steve314 Maybe I'm missing something, but why isn't 3 the right answer (in regards to '1001000'). 3 zeroes in a row there? Also slightly simpler (no need for map) would be len(max('1001000'.split('1'))). - zeekay
@zeekay - because those three zeros aren't sandwiched between ones, so it's not the "longest binary gap between two ones" as specified in the question. It's not impossible that my answer is what was really intended - with an issue like this in "the real world" I'd probably query the requirement once I spotted it. - Steve314
I guess it depends on how you define a gap - I thought gap should be surrounded by 1s. No really important. Sorry for that comment. - Petar Ivanov
@zeekay - Nice idea exploiting comparison of strings, by the way. - Steve314
@BlueRaja your code don't work for e.g. '10010001' - JBernardo
(1) @Jbernardo: Sorry, apparently I meant [1:-1]. That works, because the split ends the list with "" if the list ends with a 1 (and begins the list with a "" if the list begins with a 1) - BlueRaja - Danny Pflughoeft
5
[+1] [2011-06-15 09:17:27] starblue

Another divide and conquer approach:

Shift the number right successively by 1, 2, 4, ... and apply OR, until it is all ones. Then you know the smallest power of two that is greater than the largest gap of zeros. Back up one step and repeat for the remaining zeros.

Whether this is an improvement would be interesting to see. I would guess that for 64 bit numbers it would be better in the average case but not in the worst case, compared to a loop that looks at the bits one by one.

Edit: In the average case it is substantially faster, by a factor of more than 9 for random 64 bit numbers. Here is the Java code of the implementations I used for benchmarking:

private static final int consecutiveZeros0(long n)
{
    int result = 0;
    long rest = n;
    int count = 0;
    while (rest != 0)
    {
        if ((rest & 1) == 0)
        {
            count++;
        }
        else
        {
            result = Math.max(count, result);
            count = 0;
        }

        rest >>>= 1;
    }
    return result;
}


private static final int consecutiveZeros1(long n)
{
    int result = 0;

    long a = n;
    while (!isAllOnes(a))
    {
        int count = 1;
        a |= (a >> count);
        while (!isAllOnes(a | (a >> count)))
        {
            a |= (a >> count);
            count *= 2;
        }
        result += count;
    }
    return result;
}


private static boolean isAllOnes(long a)
{
    return (a & (a + 1)) == 0;
}

Interesting, can you provide a code example and a performance test? - TK Kocheran
6
[+1] [2011-06-15 10:38:42] Abhimanyu Srivastava

I think I can suggest a mathematical way.....before reading remember that i am not a good ex-plainer so bear up with it..:) I have only thought of the algorithm and not implemented so bear up with some loopholes.

What i suggest is we can detect a pattern in the way the binary numbers change as the number increases.lets take an example if we want to take out the max binary gap for 27 starting from 16


10000 max=0


10001 max=3 [1 number]


10010 10011 max=2 [2 numbers]


10100 10101 10110
10111 max=1 [4 numbers]


11000 max=0


11001 max=2 [1 number]


11010 11011 max=1 [2 numbers]


11100 max=0


11101 max=1 [1 number]


11110 11111 max=0 [2 numbers]


this number(27) lies between 2^4 and 2^5 and from the above pattern we can see that any number lying in this range cannot have binary gap more than 3. above we can see for a range 0f 2^4-2^5 the values change in the interval of 2^3 then 2^2 and so on.... so by finding the interval in which the number we want to find teh binary gap of lies we can calculate its maximum binary gap. taking an example: the max binary gap follows the above pattern so when we get a number say 27 we first find the range within which it lies.we get 2^4-2^5. therefore value of max=3[4-1], now 27mod[lower range] we get 11 which greater that 8 so now what we know from this is that value of max=3-1 nor 11mod4 we get 3.before proceeding one more thing noteworthy in the pattern is that the value of max within the range also follows a pattern within first 8 number of the range 4[max=3]-->2[max=2]-->1[max=1]-->1[max=0](common to all) then for next 4 2[max=2]-->1[max=1]-->1[max=0] and so on.... coming back to the example.. after mod with 8 we get three so we know that the value of max=2 now(following the pattern 8-->4-->2-->1) note:before moding check if the remainder of the previous i larger than 2^n(whatever may be the case then) 3mod2 we get 1 making the value of max=1....which will be the answer for 27.....

I am really bad descriptor and ex-planer so please bear up with the explanation and comment on whatever you find confusing..... and do tell me the loopholes if you find any....:) hope this is the answer....


7
[0] [2011-06-15 04:18:37] John Stephanos

Well I can think of one way to do it mathematically, but it would still involve iteration.

Assume the indices of the binary number start from 0 at the least significant digit, and up one for each digit.

for the number n, the index of the first 1 is given by floor(log(n)). Then set n = n-2^floor(log(n)), repeat until n = 0. Then you will have the indices of all 1's and it is trivial to find the largest gap.

Logs are all base 2 of course. I'm sure there is a better way than this though.


8
[0] [2011-06-15 04:34:54] Ravi Vyas

I believe you can do it without converting the number to a string.

You can rotate the number to the right and check if the number is odd , if yes you have a 1 in the last bit else its a 0,now keep rotating and storing the distance(s) between the 1s and you will have the largest distance.


9
[0] [2011-06-15 15:48:05] Eric Bainville

This solution uses the fact that x&(x-1) is x with its least significant 1 bit set to 0. It returns 1<<max_binary_gap.

# Return Y,BIT where BIT is the least significant 1 bit of X, and
# Y is X shifted right to remove this bit.
def lsbit(x):
  u = x & (x-1)         # U = X with least significant 1 removed
  bit = u ^ x           # BIT = least significant 1
  return ((x/bit)>>1), bit

# Return 0 if X is negative or does not have 2 bits set
# Otherwise, return 1<<max_binary_gap
def binary_gap(x):
  if x <= 0: return 0  # Invalid input
  gap = 0
  x,bit = lsbit(x)
  while x>0:
    x,bit = lsbit(x)
    gap = max(gap,bit)  # Keep largest BIT in GAP
  return gap

10