[swift-evolution] [Proposal]: Drastically improve searching API (indexOf(…)) of CollectionType

Donnacha Oisín Kidney oisin.kidney at gmail.com
Mon Jan 4 20:07:22 CST 2016


Unless I’m confused, all of the “ranged” overloads can already be achieved by composing ranged subscripts with the indexOf, etc., methods:

let ar = [0, 1, 2, 3, 4, 5]
let ind = ar[3...5].indexOf { n in n % 2 == 0 }
ar[ind!] // 4

> On 5 Jan 2016, at 01:00, Vincent Esche via swift-evolution <swift-evolution at swift.org> wrote:
> 
> There are a couple of things I’m not 100% happy with/sure about:
> 
> 1.
> I don’t like how it’s
> “<index/range/count>Of(element:range:)” but “<index/range/count>Of(range:predicate:)”.
> The reason I went for “<index/range/count>Of(range:predicate:)” was to go with the standard pattern of putting closures last and thus allowing for trailing closure syntax.
> 
> The current order allows for this:
> “<index/range/count>Of(0..10)  { $0.name == “Foobar" }”
> which I like syntax-wise.
> It however makes it look like one was looking for a range ("indexOf(0..10, …)”), not the predicate.
> 
> While the alternative requires this:
> “<index/range/count>Of(predicate: { $0.name == “Foobar" }, range: 0..10)”
> 
> I’m actually leaning towards the latter now. Dang!
> 
> 2.
> I’m unsure about the name of “lensView”. I first went with “transform”, then with “mappingFunction”, but found that neither of those made it clear that the closure should provide a specific view into the element.  Both “transform" and “mappingFunction” imply the transformation from one data type to another. It’s not about transforming. It’s about accessing.
> Thus looked for names of similar patterns and found “keypaths” (kind of) in ObjC and lenses in Haskell. To people familiar with functional programming the name should be clear. The closure passed to “lensView” is basically the getter part of a functional lens.
> Anyway, I’m not too attached to the name and more than open to suggestions.
> 
> 3.
> “BinarySearchView.init(base:validateAgainst:)” currently asserts. This should probably be split into two init functions. One assuming the base to be sorted (“BinarySearchView.init(base:)”, O(1)). And one allowing for a check, returning nil on failure (“BinarySearchView.init?(base:validateAgainst:)”, O(self.count)). Or should at least throw an error, instead of panicking.
> 
> Note: I made some minor documentation changes/fixes.
> See https://gist.github.com/regexident/2b7531bd748f57679671 <https://gist.github.com/regexident/2b7531bd748f57679671> for up-to-date RFC/source code (vs. Markdown-dump at bottom of OP).
> 
> - Vincent
> 
>> On 04 Jan 2016, at 16:13, Vincent Esche <regexident.mailinglists at gmail.com <mailto:regexident.mailinglists at gmail.com>> wrote:
>> 
>> After having had this code laying around on my Mac since 6/26/15 (waiting for Swift to go open source)
>> I figured it’s about damn time I submit the actual RFC for it. So without further ado…
>> 
>> One of the areas where Swift’s stdlib is still quite lacking is searching.
>> Which is a shame as searching (along with indexing) might be among
>> the most important tasks of computer science/software development.
>> One might not need fast collection searches for writing a banal fart or flashlight app,
>> but almost every other app is likely to benefit from having a proper searching API.
>> 
>> So I’d like to fix that.
>> 
>> Attached you find a full RFC along with the full and functional source code to make it happen.
>> 
>> I’d love to hear your opinion on this and will be more than happy to answer questions.
>> 
>> Rendered Version + Full Implementation: https://gist.github.com/regexident/2b7531bd748f57679671 <https://gist.github.com/regexident/2b7531bd748f57679671>
>> (The code is tested using Quick/Nimble. Tests would thus have to be ported to Swift’s XCTest.)
>> 
>> - Vincent
>> 
>>> Markdown-dump for
>>> 
>>> # Improved Collection Search
>>> 
>>> * Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>)
>>> * Author(s): [Vincent Esche](https://github.com/regexident <https://github.com/regexident>)
>>> * Status: **Review**
>>> * Review manager: TBD
>>> 
>>> ## Introduction
>>> 
>>> This RFC proposes an extension to the currently rather limited and linear searching API of `CollectionType`, that is `indexOf(element:)` and `indexOf(predicate:)`.
>>> It proposes the backwards-compatible refactoring of those existing methods as well as the introduction of a view-based ensemble of methods for efficient binary-search.
>>> 
>>> Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution <https://lists.swift.org/pipermail/swift-evolution>)
>>> 
>>> ## Motivation
>>> 
>>> Indexing of and searching in data is a big deal in software development. As such it is crucial to have capable means of doing so in a language that aspires to be a highly performant programming language.
>>> 
>>> Swift's current API for searching is basically limited to these two methods on `CollectionType`:
>>> 
>>> > ```swift
>>> > func indexOf(element: Self.Generator.Element) -> Self.Index?
>>> > ```
>>> > Returns the first index where value appears in self or nil if value is not found.
>>> >
>>> > **Complexity**: `O(self.count)`.
>>> 
>>> and:
>>> 
>>> > ```swift
>>> > func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) rethrows -> Self.Index?
>>> > ```
>>> > Returns the first index where predicate returns true for the corresponding value, or nil if such value is not found.
>>> > 
>>> > **Complexity**: `O(self.count)`.
>>> 
>>> The author sees a couple of issue with these two methods:
>>> 
>>> 1. They do not provide an option for restricting the "haystack" to a certain `range` of `self`.
>>> 2. They do not provide a fast (`O(log2(self.count))`) path for sorted collections.
>>> 3. They should not be limited to `CollectionType` but instead be moved to `Indexable`.
>>> 
>>> In many situations it is desirable to perform fast searches on sorted collections instead of having to fall back to naïve linear searches.
>>> 
>>> Further more while a single `index` might be the most common subject of interest, there are many scenarios where a `range` or `count` of matches are of more use.
>>> 
>>> And anyone who is writing a custom ordered collection type will eventually want to find the insertion index for a given element and will thus end up writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which all the other methods can be implemented, actually).
>>> 
>>> ## Proposed solution
>>> 
>>> The author proposes:
>>> 
>>> 1. A backwards-compatible refactoring of `CollectionType.indices`, moving it to `Indexable`.
>>> 
>>> 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional `range:` and moving it to `Indexable`).
>>> 
>>> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` with a TIME complexity of `O(self.count)`.
>>> 
>>> 4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without cluttering `Indexable`'s interface.
>>> 
>>> ## Detailed design
>>> 
>>> ### `CollectionType.indices`:
>>> 
>>> The author proposes the relocation of `.indices` from `CollectionType` to `Indexable`:
>>> 
>>> ```swift
>>> extension Indexable {
>>>     /// Return the range of valid index values.
>>>     ///
>>>     /// The result's `endIndex` is the same as that of `self`.  Because
>>>     /// `Range` is half-open, iterating the values of the result produces
>>>     /// all valid subscript arguments for `self`, omitting its `endIndex`.
>>>     public var indices: Range<Self.Index> { get }
>>> }
>>> ```
>>> 
>>> After all all it does is provide a convenience method for generating a Range from its `startIndex` and `endIndex`. There is no need for `self` to be a `CollectionType` for this.
>>> 
>>> This change should not break any existing code as it generalizes the property.
>>> 
>>> ### `Indexable` of `Comparable` elements:
>>> 
>>> The author proposes the addition of the following method on `Indexable where Element : Comparable`:
>>> 
>>> ```swift
>>> extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
>>> 	/// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
>>> 	///
>>> 	/// - Complexity: O(`self.count`)
>>> 	public func isSorted(range: Range<Index>? = nil) -> Bool
>>> }
>>> ```
>>> 
>>> This method would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.
>>> 
>>> An actual implementation of these methods can be found in `Indexable.swift`.
>>> 
>>> ### `Indexable` of `Equatable ` elements:
>>> 
>>> The author proposes the addition of the following method on `Indexable where Element : Equatable`:
>>> 
>>> ```swift
>>> extension Indexable where Index : RandomAccessIndexType, _Element : Equatable {
>>> 
>>>     /// Returns the first index where `element` appears in `self`
>>>     /// within range `range` or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`self.count`)
>>>     public func indexOf(element: _Element, range: Range<Index>? = default) -> Self.Index?
>>> 
>>>     /// Returns the first range where `element` appears in `self`
>>>     /// within range `range` or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`self.count`)
>>>     public func rangeOf(element: _Element, range: Range<Index>? = default) -> Range<Self.Index>?
>>> 
>>>     /// Returns the number of `element`s that appear in `self`
>>>     /// within range `range` or `nil` if such an element is not found.
>>>     ///
>>>     /// - Complexity: O(`self.count`)
>>>     public func countOf(element: _Element, range: Range<Index>? = default) -> Int
>>> }
>>> ```
>>> 
>>> These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.
>>> 
>>> An actual implementation of these methods can be found in `Indexable.swift`.
>>> 
>>> ### `Indexable` of any elements:
>>> 
>>> The author proposes the addition of the following methods on `Indexable`:
>>> 
>>> ```swift
>>> extension Indexable where Index : RandomAccessIndexType {
>>> 
>>>     /// Returns the first index where an element appears in `self`
>>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>>     ///
>>>     /// - Complexity: O(`self.count`)
>>>     public func indexOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Self.Index?
>>> 
>>>     /// Returns the first range where an element appears in `self`
>>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>>     ///
>>>     /// - Complexity: O(`self.count`)
>>>     public func rangeOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?
>>> 
>>>     /// Returns the number of `element`s that appear in `self`
>>>     /// within range `range` or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`self.count`)
>>>     public func countOf(range: Range<Index>? = default, @noescape predicate: (_Element) throws -> Bool) rethrows -> Int
>>> 
>>>     /// Return true iff all elements in `self` within `range` are sorted according to `isOrderedBefore`.
>>>     /// 
>>>     /// - Complexity: O(`self.count`)
>>>     public func isSorted(range: Range<Index>? = default, @noescape isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
>>> }
>>> ```
>>> 
>>> These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(self.count)`.
>>> 
>>> An actual implementation of these methods can be found in `Indexable.swift`.
>>> 
>>> **Note**: For explanation of and reasoning behind the `lensView:` argument as well as the distinction between `T` and `Base._Element`, see below.
>>> 
>>> ### `Indexable.binarySearchView(validateAgainst:?)`:
>>> 
>>> The author proposes the addition of a `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a specialized search view for performing searches with O(`log2(self.count)`) complexity:
>>> 
>>> ```swift
>>> extension Indexable where Index: RandomAccessIndexType {
>>> 	/// Create a view into a sorted indexable that allows access within `bounds`.
>>> 	///
>>> 	/// - Complexity: O(`self.count`) if a validation closure is provided, otherwise O(`1`).
>>> 	public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
>>> }
>>> ```
>>> 
>>> ### `BinarySearchView<T: Indexable>`:
>>> 
>>> The author proposes the introduction of a `BinarySearchView` struct with the following interface:
>>> 
>>> ```swift
>>> public struct BinarySearchView<Base : Indexable where Base.Index : RandomAccessIndexType> : Indexable {
>>>     public typealias Index = Base.Index
>>> 
>>>     public var startIndex: Base.Index { get }
>>> 
>>>     public var endIndex: Base.Index { get }
>>> 
>>>     public subscript (position: Index) -> Base._Element { get }
>>> 
>>>     /// Create a view into a sorted collection `base` that allows access within `bounds`.
>>>     ///
>>>     /// - Complexity: O(1).
>>>     public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, Base._Element) -> Bool)? = default)
>>> 
>>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>>     /// `lensView(element) < value` evaluates to false or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index
>>> 
>>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>>     /// `value < lensView(element)` evaluates to true or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape lensView: Base._Element -> T) -> Base.Index
>>> 
>>>     /// Returns the first index where an element appears in `self`
>>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func indexOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Base.Index?
>>> 
>>>     /// Returns the range where an element appears in `self`
>>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Range<Base.Index>?
>>> 
>>>     /// Returns the number of `element`s that appear in `self`
>>>     /// within range `range` for which `lensView(element) == value` or `nil` if such an element is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func countOf<T : Comparable>(value: T, range: Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape lensView: Base._Element -> T) -> Int
>>> 
>>>     internal let _base: Base
>>> }
>>> ```
>>> 
>>> As well as the following extension for `Indexable where Element: Comparable`:
>>> 
>>> ```swift
>>> extension BinarySearchView where Base._Element : Comparable {
>>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>>     /// `… < element` evaluates to false or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func lowerBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index
>>> 
>>>     /// Returns first index of the first `element` in `self` within range `range` for which
>>>     /// `element < …` evaluates to true or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func upperBoundOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index
>>> 
>>>     /// Returns the first index where `element` appears in `self`
>>>     /// within range `range` or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func indexOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Base.Index?
>>> 
>>>     /// Returns the range where `element` appears in `self`
>>>     /// within range `range` or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func rangeOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Range<Base.Index>?
>>> 
>>>     /// Returns the number of `element`s that appear in `self`
>>>     /// within range `range` or `nil` if `element` is not found.
>>>     ///
>>>     /// - Complexity: O(`log2(self.count)`).
>>>     public func countOf(element: Base._Element, range: Range<Index>? = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = default) -> Int
>>> }
>>> ```
>>> 
>>> These methods would perform a linear scan of the `Indexable` with a TIME complexity of `O(log2(self.count))`.
>>> 
>>> An actual implementation of these methods can be found in `BinarySearchView.swift`.
>>> 
>>> ### Why `lensView`?
>>> 
>>> Let's assume one had a collection of Persons like this:
>>> 
>>> ```swift
>>> struct Person {
>>>     let name: String
>>>     let age: Int
>>> }
>>> 
>>> let persons: [Person] = [ … ]
>>> ```
>>> 
>>> Now let's assume one wanted to find the first person that's called `John Doe`. One could either change `Person` to conform to `Equatable`, allowing `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?
>>> 
>>> Fine, but what if one wanted to search `Person`s by both their `name` or their `age` depending on the given use case? You're out of luck now.
>>> 
>>> Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that is available for any type of `Element` regardless of its conformance to `Equatable`:
>>> 
>>> ```swift
>>> let searchView = persons.binarySearchView()
>>> let index = persons.indexOf { $0.name == "John Doe" }
>>> ```
>>> 
>>> Unfortunately however such a `predicate` cannot be used when searching for an element's property in a sorted(!) collection or a collection of non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` as binary search uses `<` instead of `==` internally and as the "needle" gets passed as both the first (in `indexOf(…)` itself) and second argument (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).
>>> 
>>> The use of `lensView` however allows one to do just that without making a mess.
>>> 
>>> ```swift
>>> let searchView = persons.binarySearchView() // sorted!
>>> let index = searchView.indexOf("John Doe") { $0.name }
>>> ```
>>> 
>>> The `lensView` works similarly to how keypaths work in Objective-C, just type-safe.
>>> 
>>> ## Impact on existing code
>>> 
>>> The author proposes:
>>> 
>>> 1. Moving `CollectionType.indices` to `Indexable` should not cause any issues with existing code.
>>> 
>>> 2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not cause any issues with existing code as the `range` argument is optional and the default behaves like it used to.
>>> 
>>> 2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not cause any issues with existing code either, as the `range` argument is optional and the default behaves like it used to. Swift's trailing-closure magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: nil, predicate: …)`.
>>> 
>>> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to `Indexable` do not conflict with any existing API.
>>> 
>>> 4. The introduction of a `BinarySearchView` on `Indexable` does not conflict with any existing API.
>>> 
>>> ## Alternatives considered
>>> 
>>> The author's initial approach was to implement all methods on Indexable. This however required all methods to either be provided in two variants (one for unsorted, one for sorted `Indexable`s):
>>> 
>>> ```swift
>>> final public func indexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
>>> 	return …
>>> }
>>> 
>>> final public func sortedIndexOf(element: _Element, range: Range<Index>? = nil) -> Index? {
>>> 	return …
>>> }
>>> ```
>>> 
>>> Or require the introduction of an additional argument:
>>> 
>>> ```swift
>>> final public func indexOf(element: _Element, range: Range<Index>? = nil, isSorted: Bool = true) -> Index? {
>>>     if isSorted {
>>>         return …
>>>     } else {
>>>         return …
>>>     }
>>> }
>>> ```
>>> 
>>> It also would have cluttered `Indexable` with methods such as `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when `self` is sorted accordingly. The introduction of a dedicated BinarySearchView fixes these issues.
>>> 
>>> One also would have had to switch from `predicate: (T) -> Bool` to `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API alternative.
>> 
> 
>  _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160105/4f5f71f8/attachment.html>


More information about the swift-evolution mailing list