[swift-evolution] [Proposal]: Drastically improve searching API (indexOf(…)) of CollectionType
Florian Bachmann
bachmann.florian at gmail.com
Tue Jan 5 10:30:03 CST 2016
Hey Vincent,
good proposal, love to see this in production. Better collection search
would become quite handy :-)
regards,
Flori
On Mon, Jan 4, 2016 at 4:13 PM, Vincent Esche via swift-evolution <
swift-evolution at swift.org> wrote:
> After having had this code laying around on my Mac since 6/26/15 (waiting
> for Swift to go open source)
> I figured it’s about damn time I submit the actual RFC for it. So without
> further ado…
>
> One of the areas where Swift’s stdlib is still quite lacking is searching.
> Which is a shame as searching (along with indexing) might be among
> the most important tasks of computer science/software development.
> One might not need fast collection searches for writing a banal fart or
> flashlight app,
> but almost every other app is likely to benefit from having a proper
> searching API.
>
> So I’d like to fix that.
>
> Attached you find a full RFC along with the full and functional source
> code to make it happen.
>
> I’d love to hear your opinion on this and will be more than happy to
> answer questions.
>
> Rendered Version + Full Implementation:
> https://gist.github.com/regexident/2b7531bd748f57679671
> (The code is tested using Quick/Nimble. Tests would thus have to be ported
> to Swift’s XCTest.)
>
> - Vincent
>
> Markdown-dump for
>
> # Improved Collection Search
>
> * Proposal: [SE-NNNN](
> https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md
> )
> * Author(s): [Vincent Esche](https://github.com/regexident)
> * Status: **Review**
> * Review manager: TBD
>
> ## Introduction
>
> This RFC proposes an extension to the currently rather limited and linear
> searching API of `CollectionType`, that is `indexOf(element:)` and
> `indexOf(predicate:)`.
> It proposes the backwards-compatible refactoring of those existing methods
> as well as the introduction of a view-based ensemble of methods for
> efficient binary-search.
>
> Swift-evolution thread: [link to the discussion thread for that proposal](
> https://lists.swift.org/pipermail/swift-evolution)
>
> ## Motivation
>
> Indexing of and searching in data is a big deal in software development.
> As such it is crucial to have capable means of doing so in a language that
> aspires to be a highly performant programming language.
>
> Swift's current API for searching is basically limited to these two
> methods on `CollectionType`:
>
> > ```swift
> > func indexOf(element: Self.Generator.Element) -> Self.Index?
> > ```
> > Returns the first index where value appears in self or nil if value is
> not found.
> >
> > **Complexity**: `O(self.count)`.
>
> and:
>
> > ```swift
> > func indexOf(@noescape predicate: (Self.Generator.Element) throws ->
> Bool) rethrows -> Self.Index?
> > ```
> > Returns the first index where predicate returns true for the
> corresponding value, or nil if such value is not found.
> >
> > **Complexity**: `O(self.count)`.
>
> The author sees a couple of issue with these two methods:
>
> 1. They do not provide an option for restricting the "haystack" to a
> certain `range` of `self`.
> 2. They do not provide a fast (`O(log2(self.count))`) path for sorted
> collections.
> 3. They should not be limited to `CollectionType` but instead be moved to
> `Indexable`.
>
> In many situations it is desirable to perform fast searches on sorted
> collections instead of having to fall back to naïve linear searches.
>
> Further more while a single `index` might be the most common subject of
> interest, there are many scenarios where a `range` or `count` of matches
> are of more use.
>
> And anyone who is writing a custom ordered collection type will eventually
> want to find the insertion index for a given element and will thus end up
> writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which
> all the other methods can be implemented, actually).
>
> ## Proposed solution
>
> The author proposes:
>
> 1. A backwards-compatible refactoring of `CollectionType.indices`, moving
> it to `Indexable`.
>
> 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional
> `range:` and moving it to `Indexable`).
>
> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
> `Indexable` with a TIME complexity of `O(self.count)`.
>
> 4. The introduction of a `BinarySearchView` on `Indexable`, allowing for
> fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`,
> `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without
> cluttering `Indexable`'s interface.
>
> ## Detailed design
>
> ### `CollectionType.indices`:
>
> The author proposes the relocation of `.indices` from `CollectionType` to
> `Indexable`:
>
> ```swift
> extension Indexable {
> /// Return the range of valid index values.
> ///
> /// The result's `endIndex` is the same as that of `self`. Because
> /// `Range` is half-open, iterating the values of the result produces
> /// all valid subscript arguments for `self`, omitting its `endIndex`.
> public var indices: Range<Self.Index> { get }
> }
> ```
>
> After all all it does is provide a convenience method for generating a
> Range from its `startIndex` and `endIndex`. There is no need for `self` to
> be a `CollectionType` for this.
>
> This change should not break any existing code as it generalizes the
> property.
>
> ### `Indexable` of `Comparable` elements:
>
> The author proposes the addition of the following method on `Indexable
> where Element : Comparable`:
>
> ```swift
> extension Indexable where Index: RandomAccessIndexType, _Element:
> Comparable {
> /// Return true iff all elements in `self` within `range` are sorted
> according to `isOrderedBefore`.
> ///
> /// - Complexity: O(`self.count`)
> public func isSorted(range: Range<Index>? = nil) -> Bool
> }
> ```
>
> This method would perform a linear scan of the `Indexable` with a TIME
> complexity of `O(self.count)`.
>
> An actual implementation of these methods can be found in
> `Indexable.swift`.
>
> ### `Indexable` of `Equatable ` elements:
>
> The author proposes the addition of the following method on `Indexable
> where Element : Equatable`:
>
> ```swift
> extension Indexable where Index : RandomAccessIndexType, _Element :
> Equatable {
>
> /// Returns the first index where `element` appears in `self`
> /// within range `range` or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`self.count`)
> public func indexOf(element: _Element, range: Range<Index>? = default)
> -> Self.Index?
>
> /// Returns the first range where `element` appears in `self`
> /// within range `range` or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`self.count`)
> public func rangeOf(element: _Element, range: Range<Index>? = default)
> -> Range<Self.Index>?
>
> /// Returns the number of `element`s that appear in `self`
> /// within range `range` or `nil` if such an element is not found.
> ///
> /// - Complexity: O(`self.count`)
> public func countOf(element: _Element, range: Range<Index>? = default)
> -> Int
> }
> ```
>
> These methods would perform a linear scan of the `Indexable` with a TIME
> complexity of `O(self.count)`.
>
> An actual implementation of these methods can be found in
> `Indexable.swift`.
>
> ### `Indexable` of any elements:
>
> The author proposes the addition of the following methods on `Indexable`:
>
> ```swift
> extension Indexable where Index : RandomAccessIndexType {
>
> /// Returns the first index where an element appears in `self`
> /// within range `range` for which `lensView(element) == value` or
> `nil` if such an element is not found.
> ///
> /// - Complexity: O(`self.count`)
> public func indexOf(range: Range<Index>? = default, @noescape
> predicate: (_Element) throws -> Bool) rethrows -> Self.Index?
>
> /// Returns the first range where an element appears in `self`
> /// within range `range` for which `lensView(element) == value` or
> `nil` if such an element is not found.
> ///
> /// - Complexity: O(`self.count`)
> public func rangeOf(range: Range<Index>? = default, @noescape
> predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>?
>
> /// Returns the number of `element`s that appear in `self`
> /// within range `range` or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`self.count`)
> public func countOf(range: Range<Index>? = default, @noescape
> predicate: (_Element) throws -> Bool) rethrows -> Int
>
> /// Return true iff all elements in `self` within `range` are sorted
> according to `isOrderedBefore`.
> ///
> /// - Complexity: O(`self.count`)
> public func isSorted(range: Range<Index>? = default, @noescape
> isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
> }
> ```
>
> These methods would perform a linear scan of the `Indexable` with a TIME
> complexity of `O(self.count)`.
>
> An actual implementation of these methods can be found in
> `Indexable.swift`.
>
> **Note**: For explanation of and reasoning behind the `lensView:` argument
> as well as the distinction between `T` and `Base._Element`, see below.
>
> ### `Indexable.binarySearchView(validateAgainst:?)`:
>
> The author proposes the addition of a
> `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a
> specialized search view for performing searches with O(`log2(self.count)`)
> complexity:
>
> ```swift
> extension Indexable where Index: RandomAccessIndexType {
> /// Create a view into a sorted indexable that allows access within
> `bounds`.
> ///
> /// - Complexity: O(`self.count`) if a validation closure is provided,
> otherwise O(`1`).
> public func binarySearchView(validateAgainst isOrderedBefore: ((_Element,
> _Element) -> Bool)? = nil) -> BinarySearchView<Self>
> }
> ```
>
> ### `BinarySearchView<T: Indexable>`:
>
> The author proposes the introduction of a `BinarySearchView` struct with
> the following interface:
>
> ```swift
> public struct BinarySearchView<Base : Indexable where Base.Index :
> RandomAccessIndexType> : Indexable {
> public typealias Index = Base.Index
>
> public var startIndex: Base.Index { get }
>
> public var endIndex: Base.Index { get }
>
> public subscript (position: Index) -> Base._Element { get }
>
> /// Create a view into a sorted collection `base` that allows access
> within `bounds`.
> ///
> /// - Complexity: O(1).
> public init(base: Base, validateAgainst isOrderedBefore:
> ((Base._Element, Base._Element) -> Bool)? = default)
>
> /// Returns first index of the first `element` in `self` within range
> `range` for which
> /// `lensView(element) < value` evaluates to false or `nil` if
> `element` is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func lowerBoundOf<T : Comparable>(value: T, range:
> Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool =
> default, @noescape lensView: Base._Element -> T) -> Base.Index
>
> /// Returns first index of the first `element` in `self` within range
> `range` for which
> /// `value < lensView(element)` evaluates to true or `nil` if
> `element` is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func upperBoundOf<T : Comparable>(value: T, range:
> Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) =
> default, @noescape lensView: Base._Element -> T) -> Base.Index
>
> /// Returns the first index where an element appears in `self`
> /// within range `range` for which `lensView(element) == value` or
> `nil` if such an element is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func indexOf<T : Comparable>(value: T, range: Range<Index>? =
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
> lensView: Base._Element -> T) -> Base.Index?
>
> /// Returns the range where an element appears in `self`
> /// within range `range` for which `lensView(element) == value` or
> `nil` if such an element is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func rangeOf<T : Comparable>(value: T, range: Range<Index>? =
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
> lensView: Base._Element -> T) -> Range<Base.Index>?
>
> /// Returns the number of `element`s that appear in `self`
> /// within range `range` for which `lensView(element) == value` or
> `nil` if such an element is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func countOf<T : Comparable>(value: T, range: Range<Index>? =
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape
> lensView: Base._Element -> T) -> Int
>
> internal let _base: Base
> }
> ```
>
> As well as the following extension for `Indexable where Element:
> Comparable`:
>
> ```swift
> extension BinarySearchView where Base._Element : Comparable {
> /// Returns first index of the first `element` in `self` within range
> `range` for which
> /// `… < element` evaluates to false or `nil` if `element` is not
> found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func lowerBoundOf(element: Base._Element, range: Range<Index>?
> = default, @noescape isOrderedBefore: (Base._Element, Base._Element) ->
> Bool = default) -> Base.Index
>
> /// Returns first index of the first `element` in `self` within range
> `range` for which
> /// `element < …` evaluates to true or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func upperBoundOf(element: Base._Element, range: Range<Index>?
> = default, @noescape isOrderedBefore: (Base._Element, Base._Element) ->
> Bool = default) -> Base.Index
>
> /// Returns the first index where `element` appears in `self`
> /// within range `range` or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func indexOf(element: Base._Element, range: Range<Index>? =
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
> = default) -> Base.Index?
>
> /// Returns the range where `element` appears in `self`
> /// within range `range` or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func rangeOf(element: Base._Element, range: Range<Index>? =
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
> = default) -> Range<Base.Index>?
>
> /// Returns the number of `element`s that appear in `self`
> /// within range `range` or `nil` if `element` is not found.
> ///
> /// - Complexity: O(`log2(self.count)`).
> public func countOf(element: Base._Element, range: Range<Index>? =
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool
> = default) -> Int
> }
> ```
>
> These methods would perform a linear scan of the `Indexable` with a TIME
> complexity of `O(log2(self.count))`.
>
> An actual implementation of these methods can be found in
> `BinarySearchView.swift`.
>
> ### Why `lensView`?
>
> Let's assume one had a collection of Persons like this:
>
> ```swift
> struct Person {
> let name: String
> let age: Int
> }
>
> let persons: [Person] = [ … ]
> ```
>
> Now let's assume one wanted to find the first person that's called `John
> Doe`. One could either change `Person` to conform to `Equatable`, allowing
> `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?
>
> Fine, but what if one wanted to search `Person`s by both their `name` or
> their `age` depending on the given use case? You're out of luck now.
>
> Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)`
> that is available for any type of `Element` regardless of its conformance
> to `Equatable`:
>
> ```swift
> let searchView = persons.binarySearchView()
> let index = persons.indexOf { $0.name == "John Doe" }
> ```
>
> Unfortunately however such a `predicate` cannot be used when searching for
> an element's property in a sorted(!) collection or a collection of
> non-`Comparable` elements within a TIME complexity of `O(log2(self.count))`
> as binary search uses `<` instead of `==` internally and as the "needle"
> gets passed as both the first (in `indexOf(…)` itself) and second argument
> (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator,
> one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V,
> E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).
>
> The use of `lensView` however allows one to do just that without making a
> mess.
>
> ```swift
> let searchView = persons.binarySearchView() // sorted!
> let index = searchView.indexOf("John Doe") { $0.name }
> ```
>
> The `lensView` works similarly to how keypaths work in Objective-C, just
> type-safe.
>
> ## Impact on existing code
>
> The author proposes:
>
> 1. Moving `CollectionType.indices` to `Indexable` should not cause any
> issues with existing code.
>
> 2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not
> cause any issues with existing code as the `range` argument is optional and
> the default behaves like it used to.
>
> 2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should
> not cause any issues with existing code either, as the `range` argument is
> optional and the default behaves like it used to. Swift's trailing-closure
> magic allows `indexOf(predicate: …)` to still properly map to
> `indexOf(range: nil, predicate: …)`.
>
> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to
> `Indexable` do not conflict with any existing API.
>
> 4. The introduction of a `BinarySearchView` on `Indexable` does not
> conflict with any existing API.
>
> ## Alternatives considered
>
> The author's initial approach was to implement all methods on Indexable.
> This however required all methods to either be provided in two variants
> (one for unsorted, one for sorted `Indexable`s):
>
> ```swift
> final public func indexOf(element: _Element, range: Range<Index>? = nil)
> -> Index? {
> return …
> }
>
> final public func sortedIndexOf(element: _Element, range: Range<Index>? =
> nil) -> Index? {
> return …
> }
> ```
>
> Or require the introduction of an additional argument:
>
> ```swift
> final public func indexOf(element: _Element, range: Range<Index>? = nil,
> isSorted: Bool = true) -> Index? {
> if isSorted {
> return …
> } else {
> return …
> }
> }
> ```
>
> It also would have cluttered `Indexable` with methods such as
> `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when
> `self` is sorted accordingly. The introduction of a dedicated
> BinarySearchView fixes these issues.
>
> One also would have had to switch from `predicate: (T) -> Bool` to
> `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified
> API alternative.
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160105/fcc17425/attachment.html>
More information about the swift-evolution
mailing list