[swift-evolution] [Proposal]: Drastically improve searching API (indexOf(…)) of CollectionType

Nikolaj Schumacher me at nschum.de
Thu Jan 7 15:53:49 CST 2016


Hi Vincent,

I’m all for more search in the standard library. Most of these functions are very likely to be re-implemented in slightly different ways across projects and therefore would benefit from standardization.

Here are my notes on things that I think need some work:

- I dislike the possible assumption of a Collection to be “sorted" without validation. That’s an optimization that I consider too risky for a standard library. I think validation should be mandatory and also a “precondition" rather than a thrown error. (What are you supposed to do when you catch the error? The only thing I can think of is: Sort and try again.)

[Note: I only realized before sending that you had just changed this *to* an error. Why?]

- To me the name “binarySearchView” actually implies that it’s a sorted view into the collection (i.e. sorted once for O(log n) lookups afterwards). I would look for a different name, perhaps this one, inspired by your comment and modelled after “lazy":

let index = assumeSorted(persons).indexOf("John Doe”)

- I find it especially problematic that each method in BinarySearchView requires that the collection be sorted according to that method’s arguments beforehand. That is, the interface will let me use methods with any comparison function even though just one will work.

let persons = [Person(name: "Alice", age: 50), Person(name: "Bob", age: 30)]
let searchView = persons.binarySearchView // assuming sorted!
searchView.indexOf(Person(name: "Alice", age: 50), isOrderedBefore: <) // works
searchView.indexOf(Person(name: "Alice", age: 50), isOrderedBefore: >) // nonsense

The same applies to "lensViews": The following code is wrong, but looks fine at first glance:

let searchView = persons.binarySearchView // assuming sorted!
searchView.indexOf("John Doe") { $0.name } // works
searchView.indexOf(30) { $0.ago } // nonsense

It would be safer to make both properties of the searchView to avoid that possibility in the first place.

let searchView = persons.binarySearchView(<, {$0.name}) // assuming sorted!
searchView.indexOf("John Doe”)

- Have you considered just returning dictionaries, like Python’s “Counter"? That’s O(n), the same as the cost of validation, but allows O(1) lookups afterwards, even if the collection isn’t sorted.

- Have you considered including subsequence search?

Conclusion:
- Search functions, yes please!
- Search views need work, but might not even be worth it if validation is mandatory
- I would probably just remove lensView and maybe put that into separate proposal and consider its use across the entire stdlib. It seems like an orthogonal issue and the proposal would be more focused without. But I don’t see much benefit over using “map”, yet.


Nik



> On 4 Jan 2016, at 16:13, Vincent Esche via swift-evolution <swift-evolution at swift.org> wrote:
> 
> After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
> I figured it’s about damn time I submit the actual RFC for it. So without further ado…
> 
> One of the areas where Swift’s stdlib is still quite lacking is searching.
> Which is a shame as searching (along with indexing) might be among
> the most important tasks of computer science/software development.
> One might not need fast collection searches for writing a banal fart or flashlight app,
> but almost every other app is likely to benefit from having a proper searching API.
> 
> So I’d like to fix that.
> 
> Attached you find a full RFC along with the full and functional source code to make it happen.
> 
> I’d love to hear your opinion on this and will be more than happy to answer questions.
> 
> Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671 <https://gist.github.com/regexident/2b7531bd748f57679671>
> (The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)
> 
> - Vincent
> 
>> Markdown-dump for
>> 
>> # Improved Collection Search
>> 
>> * Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>)
>> * Author(s): [Vincent Esche](https://github.com/regexident <https://github.com/regexident>)
>> * Status: **Review**
>> * Review manager: TBD
>> 
>> ## Introduction
>> 
>> This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
>> It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.
>> 
>> Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution <https://lists.swift.org/pipermail/swift-evolution>)
>> 
>> ## Motivation
>> 
>> Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.
>> 
>> Swift's current API for searching is basically limited to these two methods on `CollectionType`:
>> 
>> > ```swift
>> > func indexOf(element: Self.Generator.Element) -> Self.Index?
>> > ```
>> > Returns the first index where value appears in self or nil if value is not found.
>> >
>> > **Complexity**: `O(self.count)`.
>> 
>> and:
>> 
>> > ```swift
>> > func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
>> > ```
>> > Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>> > 
>> > **Complexity**: `O(self.count)`.
>> 
>> The author sees a couple of issue with these two methods:
>> 
>> 1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
>> 2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
>> 3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.
>> 
>> In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.
>> 
>> Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.
>> 
>> And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).
>> 
>> ## Proposed solution
>> 
>> The author proposes:
>> 
>> 1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.
>> 
>> 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).
>> 
>> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.
>> 
>> 4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.
>> 
>> ## Detailed design
>> 
>> ### `CollectionType.indices`:
>> 
>> The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:
>> 
>> ```swift
>> extension Indexable {
>>     /// Return the range of valid index values.
>>     ///
>>     /// The result's `endIndex` is the same as that of `self`.  Because
>>     /// `Range` is half-open, iterating the values of the result produces
>>     /// all valid subscript arguments for `self`, omitting its `endIndex`.
>>     public var indices: Range<Self.Index> { get }
>> }
>> ```
>> 
>> After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.
>> 
>> This change should not break any existing code as it generalizes the property.
>> 
>> ### `Indexable` of `Comparable` elements:
>> 
>> The author proposes the addition of the following method on `Indexable where Element : Comparable`:
>> 
>> ```swift
>> extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
>> 	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
>> 	///
>> 	/// - Complexity: O(`self.count`)
>> 	public func isSorted(range: Range<Index>? = nil) -> Bool
>> }
>> ```
>> 
>> This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.
>> 
>> An actual implementation of these methods can be found in `Indexable.swift`.
>> 
>> ### `Indexable` of `Equatable ` elements:
>> 
>> The author proposes the addition of the following method on `Indexable where Element : Equatable`:
>> 
>> ```swift
>> extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {
>> 
>>     /// Returns the first index where `element` appears in `self`
>>     /// within range `range` or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`self.count`)
>>     public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?
>> 
>>     /// Returns the first range where `element` appears in `self`
>>     /// within range `range` or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`self.count`)
>>     public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?
>> 
>>     /// Returns the number of `element`s that appear in `self`
>>     /// within range `range` or `nil` if such an element is not found.
>>     ///
>>     /// - Complexity: O(`self.count`)
>>     public func countOf(element: _Element, range: Range<Index>? = default) -> Int
>> }
>> ```
>> 
>> These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.
>> 
>> An actual implementation of these methods can be found in `Indexable.swift`.
>> 
>> ### `Indexable` of any elements:
>> 
>> The author proposes the addition of the following methods on `Indexable`:
>> 
>> ```swift
>> extension Indexable where Index : RandomAccessIndexType {
>> 
>>     /// Returns the first index where an element appears in `self`
>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>     ///
>>     /// - Complexity: O(`self.count`)
>>     public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?
>> 
>>     /// Returns the first range where an element appears in `self`
>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>     ///
>>     /// - Complexity: O(`self.count`)
>>     public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?
>> 
>>     /// Returns the number of `element`s that appear in `self`
>>     /// within range `range` or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`self.count`)
>>     public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int
>> 
>>     /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
>>     /// 
>>     /// - Complexity: O(`self.count`)
>>     public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
>> }
>> ```
>> 
>> These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.
>> 
>> An actual implementation of these methods can be found in `Indexable.swift`.
>> 
>> **Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.
>> 
>> ### `Indexable.binarySearchView(validateAgainst:?)`:
>> 
>> The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:
>> 
>> ```swift
>> extension Indexable where Index: RandomAccessIndexType {
>> 	/// Create a view into a sorted indexable that allows access within `bounds`.
>> 	///
>> 	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
>> 	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
>> }
>> ```
>> 
>> ### `BinarySearchView<T: Indexable>`:
>> 
>> The author proposes the introduction of a `BinarySearchView` struct with the following interface:
>> 
>> ```swift
>> public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
>>     public typealias Index = Base.Index
>> 
>>     public var startIndex: Base.Index { get }
>> 
>>     public var endIndex: Base.Index { get }
>> 
>>     public subscript (position: Index) -> Base._Element { get }
>> 
>>     /// Create a view into a sorted collection `base` that allows access within `bounds`.
>>     ///
>>     /// - Complexity: O(1).
>>     public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)
>> 
>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>     /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index
>> 
>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>     /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index
>> 
>>     /// Returns the first index where an element appears in `self`
>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?
>> 
>>     /// Returns the range where an element appears in `self`
>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?
>> 
>>     /// Returns the number of `element`s that appear in `self`
>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int
>> 
>>     internal let _base: Base
>> }
>> ```
>> 
>> As well as the following extension for `Indexable where Element: Comparable`:
>> 
>> ```swift
>> extension BinarySearchView where Base._Element : Comparable {
>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>     /// `… < element` evaluates to false or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index
>> 
>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>     /// `element < …` evaluates to true or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index
>> 
>>     /// Returns the first index where `element` appears in `self`
>>     /// within range `range` or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?
>> 
>>     /// Returns the range where `element` appears in `self`
>>     /// within range `range` or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?
>> 
>>     /// Returns the number of `element`s that appear in `self`
>>     /// within range `range` or `nil` if `element` is not found.
>>     ///
>>     /// - Complexity: O(`log2(self.count)`).
>>     public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
>> }
>> ```
>> 
>> These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.
>> 
>> An actual implementation of these methods can be found in `BinarySearchView.swift`.
>> 
>> ### Why `lensView`?
>> 
>> Let's assume one had a collection of Persons like this:
>> 
>> ```swift
>> struct Person {
>>     let name: String
>>     let age: Int
>> }
>> 
>> let persons: [Person] = [ … ]
>> ```
>> 
>> Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?
>> 
>> Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.
>> 
>> Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:
>> 
>> ```swift
>> let searchView = persons.binarySearchView()
>> let index = persons.indexOf { $0.name == "John Doe" }
>> ```
>> 
>> Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).
>> 
>> The use of `lensView` however allows one to do just that without making a mess.
>> 
>> ```swift
>> let searchView = persons.binarySearchView() // sorted!
>> let index = searchView.indexOf("John Doe") { $0.name }
>> ```
>> 
>> The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.
>> 
>> ## Impact on existing code
>> 
>> The author proposes:
>> 
>> 1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.
>> 
>> 2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.
>> 
>> 2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.
>> 
>> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.
>> 
>> 4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.
>> 
>> ## Alternatives considered
>> 
>> The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):
>> 
>> ```swift
>> final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
>> 	return …
>> }
>> 
>> final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
>> 	return …
>> }
>> ```
>> 
>> Or require the introduction of an additional argument:
>> 
>> ```swift
>> final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
>>     if isSorted {
>>         return …
>>     } else {
>>         return …
>>     }
>> }
>> ```
>> 
>> It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.
>> 
>> One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.
> 
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160107/63e715f4/attachment.html>


More information about the swift-evolution mailing list