When Jon Bentley assigned binary search as a problem in a course for professional programmers, he found that ninety percent failed to provide a correct solution after several hours of working on it, mainly because the incorrect implementations failed to run or returned a wrong answer in rare edge cases. A study published in 1988 shows that accurate code for it is only found in five out of twenty textbooks. Furthermore, Bentley's own implementation of binary search, published in his 1986 book Programming Pearls, contained an overflow error that remained undetected for over twenty years. The Java programming language library implementation of binary search had the same overflow bug for more than nine years.
I agree that binary search is prone to errors, but the overflow error is just calculating the midpoint as
(low + high) / 2
I mean, that's technically true because it might crash for huge arrays, but if this is a bug then the follow implementation of a function that adds to integers is also buggy, since it will overflow when x + y doesn't fit in an int:
An important difference between your add method and binary search is that the signature of your add method already implies the contract that the sum of the two integers must fit into an int because there simply is no correct value of the specified return type that could be returned otherwise.
There's nothing about the signature of an ordinary binary search method that would imply that it only works for arrays that have less than MAX_INT/2 elements.
True, but you can make the same case for the addition operator which might have a type annotation somewhere, but it's certainly not visible to most programmers.
Simply put, in many languages there is no addition operator which does mathematically correct addition and that is a sad state of affairs.
In the first few decades of electronic computers and programming languages, having an addition or any other arithmetic operation that would not signal correctly the overflow exceptions would have been considered as completely unacceptable.
Computers without the right hardware implementation appeared initially among the so-called minicomputers and microcomputers, i.e. the cheapest computers, which initially were intended for things like industrial control or peripherals for larger computers, where it was supposed that competent programmers will take care to use appropriate workarounds for the hardware limitations.
Unfortunately, this kind of implementation of the arithmetic operations, without appropriate means for detecting overflows, intended initially only for the cheapest products, has spread over the years to all CPUs.
Even if from time to time there are news about some horror story caused by a combination of weak hardware with the lack of appropriate software checks, it appears that there is no hope that this unfortunate hardware design fashion will ever be reversed.
The processors I've used all have an overflow flag that will tell you if an addition result exceeded the size of the register. But I'm not aware of any compilers that will use the flag, because it adds overhead that isn't wanted or needed 99.99% of the time.
> Simply put, in many languages there is no addition operator which does mathematically correct addition and that is a sad state of affairs.
I read a book on Clojure when it was fairly new containing a spirited defense of the fact that arithmetic operators like + and - always returned the correct result. This was slower, because they needed to do bounds checking, but the result was always correct. If you wanted faster arithmetic with bugs, you'd use the explicit operators +. or -. (or *. or, presumably, /. -- I'm not sure how division was handled).
Shortly after that, Clojure reversed its policy and + will give you fast addition with bugs.
I don't think this is an unfair gotcha. If you have u32 for low and high, and someone gives you an array with 4 billion entries, you can return the correct value if you compute the midpoint correctly.
If ints are 32 bit you would expect to be able to add 1 to 2^20, but that causes an overflow with this implementation. (low + high) / 2 is arguably worse because you turn the algorithm from working on any array into one that only works on arrays that are small enough not to cause the overflow.
I think that may actually be considered buggy, and the actual definition should be
long add(int x, int y) { return x + y; }
with the caveat of course that `long` is larger than an `int`, which I can't recall if it is valid on all platforms. So mathematical u32 + u32 should be u64, but you can define an `add` function that has strict overflow behavior.
Yes, this is essentially the case I'm making: if this implementation of binary search is buggy, then so is the addition operator in C (and many related languages).
They're not comparable. Trying to compare them suggests that you don't understand what the issue is with the `(low + high) / 2` case, or you don't want to acknowledge it.
Simple addition will fail if the result of the expression should fall outside the range of integers expressible for a given integer width—because it can't fit. That's not insightful. It's facile. The criticism of the naive midpoint computation is not facile.
The problem with using `(low + high) / 2` to compute the midpoint (instead of `(high - low) / 2 + low`) is that it will give invalid results even when the data type from the signature is wide enough that the expected result can fit.
Microsoft are the kings of backward compatibility. Since long has been 32 bits for the last 20 years, it shall remain so until the end of time. Linux compilers are free to be a little more pragmatic. The standards are flexible enough to allow either behavior.
Tim Bray wrote a post, "On the Goodness of Binary Search," in which he stated, "Simplicity is good, and binary search, coded properly, (see below) is awfully damn simple" (emphasis in original):
In an update to his post, Tim linked to Joshua Bloch's "Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken" post about the problems you describe:
> he found that ninety percent failed to provide a correct solution after several hours of working on it
I wonder how many university algorithms engineers would run into a similar situation with. I'm not very far out of university, so I have an unfair advantage, but take some engineers with at least 10 year experience and have them implement Dijkstra's by memory. Not saying this is necessarily a bad thing. If you're in industry for years, you have gained a lot more applicable knowledge to your work, kind of like how runners have a different kind of strength and muscle than weight lifers - they both meet their needs.
That said, I'm surprised at 90% overflowing code. Is it all in that midpoint calculation?
Yes, it's always the midpoint calculation. Everybody forgets that (low+high) could overflow. It's trivial to work around once you know - just replace mid=(low+high)/2 with mid=low+(high-low)/2.
I agree - the best binary search algorithm is the one someone else wrote, preferably in the standard library of the language you are using.
I've written binary search algorithms maybe a half a dozen times, and every time I get tripped up by some boundary condition or other and what is only a few lines of code takes a really long time to get exactly right for all the corner cases.
My own solution was in Python, so it didn't fall prey to the very common integer overflow error. Python has the amazing and handy property that integers don't overflow.
I am a huge proponent of "bitwise" binary search. Consider r to be the result
index we wish to find in range [0, n), that is, the first index where a
predicate returns false, or n if there is no such predicate. We can construct this index bit-by-bit, starting from
the most significant bit:
# Extracts the highest set bit for n > 0.
# This can be 1-5 instructions on most machines using count leading zeros.
def highest_bit_set(n):
return 1 << (n.bit_length() - 1)
def binary_search(n, predicate):
# Construct largest possible r such that predicate(r) is true, or 0.
if n == 0: return 0
r = 0
bit = highest_bit_set(n)
while bit:
if (r | bit) < n and predicate(r | bit): r |= bit
bit >>= 1
# Return smallest possible r such that predicate(r) is false, or n.
if r > 0 or predicate(0): r += 1
return r
I find this to be significantly easier to reason about, although I do have a lot
of experience with bitwise manipulation. And it is trivial to extend to search
a range [lo, hi):
Note that with this version, the expected number of calls to `predicate` can be 1 more than strictly necessary (depending on n). This happens when n is one more than a power of 2, e.g. if n is 65 then `bit` is 64 and the first call will be `predicate(64)`, which is basically useless in reducing the search space.
Because I'm used to half-open intervals, I would have used the invariant
left <= target < right
Or in other words
target in range(left,right).
The article would phrase this as making the target green rather than red. The code would end up very similar, except at the start you can initialize right to len(array) rather than len(array)-1, which is a bit cleaner. Also I would write the while loop condition as
while right - left > 1:
i.e. 'while we have more than one option remaining'.
- When the target appears multiple times, the search will find the last occurrence rather than the first.
- If you return `right`, then you get the behaviour of C++'s std::upper_bound and Python's bisect.bisect_right: The return value is the index of the first element strictly greater than the target, or `len(array)` if all the elements are less than or equal to the target.
- If you return `left`, you get a somewhat unusual beast: The return value is the index of the last element less than or equal to the target. If all the elements are greater than the target, the return value should be -1 (because returning 0 would incorrectly suggest that `array[0] <= target`).
Initialising `right` to `len(array)` removes one of the special cases (which is nice and can be done regardless of the invariant), but if `left` remains initialised to 0 then a special case would still be necessary to distinguish between "the first element is equal to the target" and "all the elements are greater than the target" (this special case can be eliminated by initialising `left` to -1; see my top-level comment).
Inclusive-exclusive intervals are a great in many cases, but a little pragmatism can make life easier.
Using half-open intervals for std::lower_bound and std::upper_bound has an unexpected benefit. It allows the functions to return an insertion point that keeps the sequence sorted, even if that insertion point is at the end of the sequence.
I love the colouring approach, that's same approach I use when explaining/analysing binary search. One advantage of this approach that was not explicitly mentioned is that the behaviour is clear and well-defined when the target element appears multiple times, unlike some implementations where it is unspecified which occurrence will be found. Minor point: I prefer to negate the predicate, i.e. "is_red" instead of "is_green". Then the definition of the binary_search is "return the first index for which the predicate returns true", which I find more natural.
But the three if-statements for the base cases are horrible and completely unnecessary!
We can't initialise `left` to 0 because that might violate the invariant (it is possible that `is_green(arr[0])` will return False). However, we can initialise `left` to -1 (!) if we define `is_green(arr[-1])` to be True, and make sure to never actually compute `is_green(arr[-1])`. (Note when I say `arr[-1]` I am referring to the fictional element before the first element, not the element specified by Python's negative index rules.)
Similarly, we can initialise `right` to `len(arr)` if we define `is_green(arr[len(arr)])` to be False.
So the first line becomes
left, right = -1, len(array)
and the three if-statements can be deleted.
This uses an exclusive-exclusive range, which can be a little unsettling for people used to inclusive-exclusive ranges. But I think the elegance of the invariants totally justifies it.
Another reason I prefer exclusive-exclusive over inclusive-exclusive for this is that it makes the rounding in `(left + right) // 2` consistent with the "uniqueness" of the middle. An array with 3 items has a unique middle (index 1), while an array with 4 items has two "middles" (index 1 and index 2). With an inclusive-exclusive range, the rounding does not reflect this: when the size is 3, `(0 + 3) / 2` gives 1.5 (which must be truncated to give the correct index); when the size is 4, `(0 + 4) / 2` gives 2.0 (which arbitrarily picks index 2). It bothers me that the case with a unique middle involves rounding (which must be done in the correct direction!) and the case with no unique middle returns a misleadingly exact result. With an exclusive-exclusive range on the other hand, it all works nicely: when the size is 3, `(-1 + 3) / 2` gives 1.0 (which is the unique middle); and when the size is 4, `(-1 + 4) / 2` gives 1.5 (which can be rounded up or down to get the two "middles").
The best way I've seen to handle such problems full of possible mistakes is to use Test-Driven Development.
Write a test for a trivial case, like a zero-length array. Think of the design. What should it return? Should it throw an exception? Do that, and only that, in the implementation.
Each iteration after you add a test, you should first focus on getting every test to pass, then commit to version control, then refactor to get the clearest implementation.
A next iteration would be to test with a single element - a matching one, case in which it should return its index (zero in zero-indexed languages).
Then with a non-matching one - again, you have to make a design decision: throw an exception, return -1, or something else.
Then a case with two elements, when the first is the match. Then two with the second matching. Then two with none matching.
For as long as you can think of tests breaking your implementation, keep thinking of edge-cases. Eventually you end up with a formidable collection of test cases, and you will not be afraid of refactoring or changing things, because you will find out quickly and easily which tests break.
Edit: boxed mentioned Property-Based Testing and Mutation Testing. These are amazing ways to come up with more corner cases to cover.
The reason there are so many buggy binary search implementations however is also the reliance of TDD and similar. Very few tests attempt a binary search over a container where overflows become an issue. Yet sooner or later someone will actually try to use it for that purpose and discover that it fails.
This process would be satisfied with a linear search. To even just see that there is a log n algorithm in the case where the input is sorted, you need to think analytically about the problem and its solution.
Suppose the problem is presented with runtime constraints for some sufficiently large test cases, such that this difference in performance matters. Even in that case, what are the chances that a TDDer who is unaware of the binary search algorithm will stumble on the correct solution, while never looking beyond the next test to be satisfied? In the small cases, the difference is not discernible. I suggest that to be successful, one must think analytically about the problem, and code to the insights revealed by that analysis, in addition to the specifications themselves.
Agreed. The idea that a developer would start coding with TDD and subsequently arrive at a O(n log n) solution that works on all edge cases is... fantasy to put it mildly.
PBT is awesome but I would still like to have tests I can easily reason about when implementing a tricky algorithm. The way I see it, TDD is the specification, PBT is for finding gaps in my specification.
A previous version of the post used true/false, but I think color is easier to understand after revision (e.g. "left always points to an element corresponding to true" -> "left always points to a green element").
A related question: When I use the Print command in Safari (macOS) to convert the blog post into a PDF document, the background colors are not rendered. Does anyone know why?
The visual explanation in this article is also very nice (color the items less than the value to search for with green, items larger with red), and it shows how people can have different intuitions over how and why an algorithm works.
Sometimes, I find it is easier to make your base case broader and bruteforce it one by one when you couldn't get it right. For example, when I tried to exploit RSA parity oracle I couldn't find the right answer when it is narrowed to the smallest range for some unknown reason. Changing the loop condition to `while right-left>1000` works pretty well if you don't want to debug it more.
>Then, admire how the loop body writes itself. There is no equality check, no +1/-1 arithmetic: just move the correct-color pointer.
The +1/-1 is an optimization because you know what you are searching for isn't at that index. You no longer need to keep that mid point in your new search space.
I think you're right that I don't do the "normal" implementation quite enough justice; I've been meaning to write a third followup binary search post to "redeem" it. As others have mentioned, you can prove it correct with a simple invariant: the value that you are looking for is in the narrowed range (if it is in the array to begin with). But I think that invariant is harder to write code for, which is the point of this article (maybe I should state that explicitly).
That said, you can only use the +1/-1 optimization when you are searching for a particular element, so you can no longer use a function that returns true/false; you need a third state to distinguish equality. So it's also not as theoretically "pure" as I would like. But not only is it less pure, it's also not applicable to other real problems (see the followup post: https://blog.tylerhou.io/posts/binary-search-revisited/).
That's also why I don't really like it as it makes binary search too restrictive. I think it's a major failure that we teach people that binary search is an algorithm for finding a value in a sorted array—it's really a much more powerful and general algorithm for finding a transition point in a monotonic function from {ints, reals, any totally-ordered set} to booleans. That's another reason I prefer this approach, not just simplicity in implementation.
> That said, you can only use the +1/-1 optimization when you are searching for a particular element, so you can no longer use a function that returns true/false; you need a third state to distinguish equality. So it's also not as theoretically "pure" as I would like. But not only is it less pure, it's also not applicable to other real problems (see the followup post: https://blog.tylerhou.io/posts/binary-search-revisited/).
Why do you need a third state? The spec for your binary search says that it returns the first non-green element. A common spec for binary searches on arrays is that it should return the first element that is not < needle, and your blog includes an example of using your generalized binary search this way. I think a working binary search for arbitrary functions on integers that uses this "optimization" follows:
# search [lo, hi) for the least integer x with !is_green(x)
# or hi if they are all green
# or lo if you pass in a malformed interval
def binary_search(lo, hi, is_green):
while hi > lo:
mid = lo + (hi - lo) // 2
if is_green(mid):
# we will definitely return a value greater than
# mid, since mid is green.
# let's only look at elements past the known-green ones.
lo = mid+1
else:
# hi is the greatest value we can return,
# even though we'll never look at it,
# so mid is a good value for hi here
# (in particular, if hi==lo+1, then mid==lo, and
# the above branch can set lo=hi before we exit
# the loop and return lo)
hi = mid
return lo
# Usage.
binary_search(0, len(array), lambda i: array[i] < 6)
It seems like the difficulty you encountered with this approach is to do with the fact that you'd like to use an invariant in which you have at least one red and at least one green element. If you relax that self-imposed requirement then you can apparently delete half of the code.
Edit: I had a hard time articulating the invariant for the above binary search, which I commonly write for arrays only. I think it's "Given an interval [lo, hi) in which elements monotonically 0-or-more green then 0-or-more red, and given that (-inf, lo) is all green and that [hi, inf) is all red, let's find the least red element." This is a surprising amount of complexity, but I guess I'd rather have the complexity in the English description of the invariant than in the code. If someone has a more concise rationale for shoving both red and green mids out of the interval under consideration, that would be swell.
In your followup article, you have:
if left >= right:
return 0
if not is_green(left):
return 0
if is_green(right):
return right + 1
This is a bit weird. It seems like if I give you an all-green interval of length 0, 1, or 2, I should get 1-past-the-end in all of those cases, but here we get 0, 0, and 1-past-the-end. It's weird to get 0 if our interval is all-green [999,999] then get 1001 if we ask about all-green [999,1000]. It also seems like if I give you an all-red interval like [-1000,1000], you shouldn't return 0, since that element has a lot of red elements before it.
Yeah, zero is probably not the best thing to return if left == right; same with an all red interval. Wasn't thinking carefully enough when generalizing it, but I will fix.
Psst, if you use an inclusive-exclusive interval for the parameters and an exclusive-exclusive interval for the implementation (see my top-level comment), it becomes nicer and you can keep your original invariants. You just need `left -= 1`:
def binary_search(left, right, is_green):
left -= 1
while left + 1 < right:
middle = (left + right) // 2
if is_green(middle):
left = middle
else:
right = middle
return right
# Usage.
binary_search(0, len(array), lambda i: array[i] < 6)
This is equivalent* to @anonymoushn's version if you define `left = lo - 1` and `right = hi`.
*Equivalent except for the rounding in the division.
Very good. I also find the division rounding and symmetry in your implementation aesthetically appealing. I think I just instinctively reach for half-open intervals for reasons unrelated to this particular problem.
For signed search ranges, if you modify my implementation above to compute the midpoint as (lo + hi) // 2 and (lo + hi) < 0 and the language rounds integer division toward zero, you gain a rounding-related bug. In this sort of circumstance, your approach does not gain that bug because it doesn't care which way division is rounded. Python doesn't have overflow and rounds down, so ~all of the errors we can make on that line won't happen anyway :)
If the range is unsigned, lo-1 is likely to be MAX_VALUE, but it seems like it still does the right thing.
I saw your comments above as well. I'm not 100% convinced that this is better; I actually like the three "edge cases" because I find that code/proof of correctness easier to understand. In your code, I have to break the analysis into two cases:
1. The entire range isn't red, so at some point [proof needed] we will move left to a green element. From that point on we can continue analysis like normal. (I can't think of an easy way to prove such a claim without messing with floor division and induction, etc... but it seems intuitively true.)
2. The entire range is red, so green will never be moved, so right will point to the first element and green will point to one before the first. This returns the correct value.
I'll have to come up with a way to easily prove statement 1 above to be convinced that it is better. But I can include this version in the follow up.
Defining `left - 1` to be green `right` to be red can be thought of as "padding" the input to ensure that there is at least one green and one red, but I'll concede that it is sneaky.
(A less sneaky version is to explicitly define new predicates: `is_green2(i) := (i == left - 1) || is_green(i)` and `is_red2(i) := (i == right) || is_red(i)`, where `left` and `right` refer to the function's original parameters (inclusive-exclusive). These predicates can be used as the invariants, and are sufficient for showing that at the end of the loop the result is either the first red, or one-past-the-end if everything is green.)
An alternative is to tweak the invariants:
Let `left` and `right` be the function's original parameters (inclusive-exclusive), and let `l` (that's a lowercase L) and `r` be the variables used in the loop (exclusive-exclusive).
The original invariants are `is_green(l)` and `is_red(r)`, which require sneaky definitions when `l = left - 1` and `r = right`.
The tweaked invariants are `is_green(i) for all i where left <= i <= l` and `is_red(i) for all i where r <= i < right`. In other words, everything up to and including `l` is green, and everything from `r` onward is red.
These invariants are vacuously true when `l = left - 1` and `r = right`, so they hold before the loop (without having to define `left - 1` to be green and `right` to be red).
The "+1" is not a superficial optimisation, it can be critical for correctness. Consider the following (correct) implementation, which uses +1:
def binary_search(array, target):
left, right = 0, len(array)
while left < right:
middle = (left + right) // 2
if array[middle] < target:
left = middle + 1
else:
right = middle
return right
If you change `left = middle + 1` to `left = middle`, it hangs in an infinite loop!
(Explanation: when `left` and `right` differ by 1, `(left + right) // 2` will return `left` and the search will not make any progress.)
It could potentially be a pretty good interview question to weed out those who have spent the last 6 months memorizing algorithms.
Realistically, anyone with programming experience would immediately question the premise of being asked to implement a binary search on a whiteboard because realistically the only way anyone will succeed is by having memorized an implementation.
I don't think this is an especially great question, but I worked at one company that asked this question and most people who got it right did not seem to have an implementation memorized. I say this because they got it right only after writing a buggy version and fixing some bugs.
Another company was in the habit of asking a different question that gets to the heart of the matter of "Can you think with invariants?" but is not nearly as tricky as binary search. That seemed better to me.
It is a very useful question if you do not simply check whether the result is correct, but instead use it as a way to evaluate the candidate's problem-solving skills.
Very nice! I always struggled with binary search programming problems until I starting frame things in a similar way: Check a certain condition, return the appropriate answer when it's false, write a guaranteed-to-terminate loop that maintains that same condition as an invariant, then interpret the final state to return the appropriate answer.
function binarySearch(arr, target) {
let left = 0;
let right = arr.length - 1;
// empty arr, so we won't find anything
if (!arr.length) {
return -1;
}
// the target is less than the smallest number in arr
if (target < arr[left]) {
return -1;
}
// the target is greater then the smallest number in arr
if (arr[right] < target) {
return -1;
}
// These two `if` are needed for the case when there are only 2 elements
// in the array.
if (arr[left] === target) {
return left;
}
if (arr[right] === target) {
return right;
}
while (left + 1 !== right) {
const middle = left + Math.floor((right - left) / 2);
if (arr[middle] < target) {
left = middle;
} else {
right = middle;
}
}
if (arr[right] === target) {
return right;
}
return -1;
}
I think the reframing of the problem does make it look easier at first, but then you end up needing those three special cases which in my opinion negates the elegance.
Also, since the reframed problem never terminates early, the described algorithm always has the same number of lookups (i.e. you always get the worst-case).
The three special cases are unnecessary (see my top-level comment).
Implementations that terminate early might seem faster at fist glance, but they can actually be slower: they only save 1 iteration on average*, and they require up to 2 branches in each iteration instead of 1. The cost of doubling the number of branches can outweigh the savings (1 iteration out of log2(n)). A possible exception is if the comparison function is expensive and returns a rich comparison (less/equal/greater), in which case it might make sense to perform the 2 branches.
But a much more significant disadvantage of terminating early is that when the target element appears multiple times, there is no guarantee which occurrence will be found.
*Incomplete proof that the saving is only 1 iteration on average: n/2 cases will not terminate early, n/4 will terminate 1 iteration early, n/8 will terminate 2 iterations early, n/16 will terminate 3 iterations early, etc. So the total saving is n/4*1 + n/8*2 + n/16*3 + ... which is at most n (the proof is the same as for the time complexity of the "heapify" operation).
Wikipedia has an entire section dedicated to this here: https://en.wikipedia.org/wiki/Binary_search_algorithm#Implem...