[swift-evolution] [Discussion] Binary search, lower bound, upper bound, equal range

Mon Feb 15 15:34:54 CST 2016

on Mon Feb 15 2016, Jeff Hajewski <swift-evolution at swift.org> wrote:

> Hello,
>
> I am working on SR-368 <https://bugs.swift.org/browse/SR-368> and am hoping to get some feedback from the community.
>
> Description
> Implement analogs to C++’s binary_search, lower_bound, upper_bound, and equal_range algorithms.
>
> Initial Proposal
> I propose adding four methods for generic Array and adding four
> methods for Arrays that conform to Comparable:
>
> Array
> @warn_unused_result
> func lowerBound(value: Array.Element, @noescape isOrderedBefore: (lhs:
> Array.Element, rhs: Array.Element) -> Bool) -> Int?
>
> @warn_unused_result
> func upperBound(value: Array.Element, @noescape isOrderedBefore: (lhs:
> Array.Element, rhs: Array.Element) -> Bool) -> Int?
>
> @warn_unused_result
> func binarySearch(value: Array.Element, @noescape
> isOrderedBefore:(lhs: Array.Element, rhs: Array.Element) -> Bool) ->
> Bool
>
> @warn_unused_result
> func equalRange(value: Array.Element, @noescape isOrderedBefore: (lhs:
> Array.Element, rhs: Array.Element) -> Bool) -> Range<Int>?
>
> Array: Comparable
> @warn_unused_result
> func lowerBound(value: Array.Element) -> Int?
>
> @warn_unused_result
> func upperBound(value: Array.Element) -> Int?
>
> @warn_unused_result
> func binarySearch(value: Array.Element) -> Bool
>
> @warn_unused_result
> func equalRange(value: Array.Element) -> Range<Int>?
>
> Discussion
> The main issue I’d like to discuss is how do we want to add these to Swift? My thoughts are as follows:
>
> These should be as generic as possible. I initially started at the
> CollectionType level but it seemed that the problem was ultimately
> reduced to an Array (through the use of various protocols, etc.). 

These should be defined on CollectionType Note that you don't need
random access to get an advantage from binary search if the comparison
is sufficiently expensive.

> For example, suppose we implemented these methods as an extension to
> CollectionType, if we were to try and call binarySearch on a set, we
> would first need to sort the set, which would return an array, and
> then we would call binarySearch on that array. 

It should work on any arbitrary collection.  The collections you know
are not the only ones that matter.  For examples, consider binary
searching the result of applying a lazy map or filter to some sorted
array.

> Similarly, it seems at the SequenceType level the problem ultimately
> reduces to using Array.

Sequences don't have indices; you can't binary search them.

> Does it make sense to handle these as public functions? I tend to
> think not as it seems less idiomatic.  I suggest eight implementations
> similar to how sort() is handled. If the calling array’s elements
> conform to Comparable, then no isOrderedBefore closure is required
> since we can use “<“, otherwise the user should supply a closure to
> allow the algorithm determine how the array elements are ordered.

The general form takes a closure that partitions the collection; see
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2001/n1313.html and
http://wg21.cmeerw.net/lwg/issue270

I would suggest avoiding the API choice that requires both a comparison
closure *and* an element against which to compare, as a unary
partitioning predicate is much simpler.

> Similar to the C++ implementations, I avoid recursion and favor
> while-loops.

Have an implementation is important as proof-of-concept and for
performance testing, but I don't consider the details important at this
stage.

> These methods should be preconditioned on the calling
> array be partitioned with respect to the passed value. 
> As far as I’m aware, there is no “isPartitioned(value:)” method
> currently available. 
>
> Should we add this as well? We could use this functionality and not
> add a isPartitioned method but if we are adding this functionality is
> there a good reason not to give the public access?  

Seems like a separable proposal.

> The alternative is to not set a precondition and document that the
> results may not be valid if the calling array is not partitioned with
> respect to value. 

That *is* a precondition.  Not all preconditions can or should be
verfied at runtime.  We would never make the caller pay to actually
verify at runtime that the collection is partitioned w.r.t. the closure;
that would make the whole operation O(N) and effectively discard any
benefit of binary-searching.

> I favor the preconditioning approach.
>
> Any and all feedback is greatly appreciated!

HTH,
Dave

-- 
-Dave