[swift-evolution] [Review] SE-0065 A New Model for Collections and Indices

Mon Apr 11 23:56:55 CDT 2016

>> If these types are for implementation sharing, should they be
>> underscored to discourage their use? Or is the position they occupy in
>> the type hierarchy important because Range and ClosedRange will
>> eventually occupy them?
> 
> Underscoring hides names from users completely, and IMO that would not
> be appropriate here.  If we underscored them, it would be mysterious
> where an implementation of various methods came from.

If the concrete types show these members, does it matter that they actually came from a protocol?

>> On the other hand, it's not like the SubSequence subscript now takes a
>> RangeProtocol. Should it?
> 
> No; subscript can't be generic (language limitation).

Right, and RangeProtocol isn't existential. Ouch.

>>> func successor(of i: Index) -> Index
>> 
>> Two things:
>> 
>> 1. I would really like a version of this which returns Optional and is
>> guaranteed to go `nil` once it hits `endIndex`. 
> 
> The primary question to answer when exploring this idea is, IMO, “what
> does that do to the code in algorithms?”  I can't imagine writing binary
> search, partition, or rotate if I had to check for nil every time I
> moved an index.

If you're confident you're remaining in bounds, you should force-unwrap it.

>> There can be a non-optional version too, or it might even be a feature
>> of the `index` family of methods instead of `successor` itself, but I
>> think it would be valuable to let the collection worry about the
>> bounds check.
> 
> Why would that be valuable?
> 
>> It seems silly to check the index before calling `successor(of:)` when
>> `successor(of:)` is going to immediately perform the same check again
>> as a precondition.
> 
> Preconditions are not necessarily checked.  We don't promise to trap
> every precondition violation.  We promise memory and type safety in the
> absence of data races, and some precondition failures will trap in order
> to ensure that.  Others will trap, where we think it is affordable, just
> to provide a better programmer experience.

I understand that, but many—perhaps most—clients of APIs like `successor(of:)` will need to perform a bounds check. I think we would be better off if the check were implicit in the call. That would force all clients, or at least all clients which used the bounds-checking variants (which would be encouraged), to explicitly handle out-of-bounds conditions, in much the same way that `index(of:)` forces its clients to explicitly handle the possibility that a matching element might not exist, rather than returning `endIndex` (which would be easier).

>> (Actually, in general, I'm a little bit dismayed that the collection
>> API does so little to help you include bounds checks in your
>> code. Especially when iterating through a collection, bounds checks
>> are absolutely mandatory, and the collection API's solution to the
>> problem is "eh, just use `<`, it's not like you might mess something
>> like that up".)
> 
> What do you mean?  Could you show an example of what you think should be
> better supported?

The simplest thing would simply be to have calls like these in Collection:

	func validate(i: Index) -> Bool
	func validateIncreasing(i: Index) -> Bool
	func validateDecreasing(i: Index) -> Bool

Usage:

	while collection.validateIncreasing(i) {
		collection[i] = repeatedValue
		i = collection.successor(of: i)
	}

These methods would be one-liners, sure, but they would make your code say what was *meant*, not what happened to be true.

However, we could go further than this and perhaps reduce the amount of bounds-checking we do, even across optimization barriers.

To explain what I mean, let's take a look at `IndexingIterator`. With various less-interesting bits stripped out, the pull request defines it like this:

	public struct IndexingIterator<Elements: IndexableBase>: IteratorProtocol, Sequence {
	  init(_elements: Elements) {
	    self._elements = _elements
	    self._position = _elements.startIndex
	  }

	  mutating func next() -> Elements._Element? {
	    if _position == _elements.endIndex { return nil }
	    let element = _elements[_position]
	    _elements.formSuccessor(&_position)
	    return element
	  }

	  internal let _elements: Elements
	  internal var _position: Elements.Index
	}

Let's annotate the `next()` method with its bounds-checking behavior.

	  mutating func next() -> Elements._Element? {
	    if _position == _elements.endIndex { return nil }		// Explicit bounds check performed here
	    let element = _elements[_position]				// Usually precondition()s a bounds check to preserve memory safety
	    _elements.formSuccessor(&_position)			// May perform a bounds check on the incremented value
	    return element
	  }

Depending on the implementation, at least two, and possibly all three, of these lines will involve a bounds check. Now, perhaps the optimizer is good at eliding redundant checks involving an IndexingIterator because stdlib is transparent, but similar user code wouldn't get that benefit.

So, imagine that we have a type like this in the standard library:

	/// Represents a pre-validated index. A pre-validated index received from a given collection is 
	/// guaranteed to refer to a valid element in that collection, as long as the collection is not mutated.
	/// 
	/// -Warning:	Operations which accept a Valid<Index> assume it is in bounds and do not perform 
	///			bounds checks. Using a Valid<Index> on a collection other than the one which created 
	/// 			it, or on a collection which has been mutated since it was created, may be unsafe.
	struct Valid<Index: Comparable> {
		init(unsafeIndex index: Index) { self.index = index }
		private(set) var index: Index
	}

And a few members like these in the Collection protocols (there would be others; I'm just defining the ones we're using here:

	/// Validates an index, returning `nil` if it is invalid, or a Valid<Index> for that index otherwise.
	func validate(i: Index) -> Valid<Index>?

	/// Transforms `i` to represent the next valid index after itself, or `nil` if that index would be out of bounds.
	func formValidSuccessor(i: inout Valid<Index>?)

	/// Accesses the element corresponding to the prevalidated index
	subscript (i: Valid<Index>) -> Element { get }

With those in hand, you can rewrite IndexingIterator like so:

	public struct IndexingIterator<Elements: IndexableBase>: IteratorProtocol, Sequence {
	  init(_elements: Elements) {
	    self._elements = _elements
	    self._position = _elements.validate(_elements.startIndex)
	  }

	  mutating func next() -> Elements._Element? {
	    if _position == nil { return nil }				// This merely checks the result of the previous bounds check
	    let element = _elements[_position!]			// No bounds check needed
	    _elements.formValidSuccessor(&_position!)	// This is the only bounds check in the loop
	    return element
	  }

	  internal let _elements: Elements
	  internal var _position: Valid<Elements.Index>?
	}

This version only tests the bounds once per element, inside `formValidSuccessor(_:)` (or in `validate` for the start index), rather than several times.

(Incidentally, `Valid<Int>?` could theoretically be expressed in the size of a normal `Int`, since `endIndex <= Int.max`, and `endIndex` is not valid. I don't see a good way to tell Swift about this, though.)

>> 	collection.index(5, from: i)
> 
> I don't think this one reads clearly enough.

"The index 5 from i" reads fine to me, but I guess that's a matter of opinion.

>> 	collection.traveling(5, from: i)
>> 	collection.striding(5, from: i)
>> 	collection.advancing(i, by: 5)
> 
> None of the “ing” names work, IMO because that suffix suggests you're
> returning a modified version of the receiver.

Huh, that clarifies something. How about the non-`ing` variants?

	collection.travel(5, from: i)
	collection.stride(5, from: i)
	collection.advance(i, by: 5)

I'd say `stride` might be an attractive nuisance in this form (although if Strideable's public face becomes `Self.striding(by:)`, only to Swift 2 users) but the others look fine to me.

>>> func index(n: IndexDistance, stepsFrom i: Index, limitedBy limit: Index) -> Index
>> 
>> 2. This method can move the index either forwards or backwards, but
>> only one limit is specified. Would we be better off having the `limit`
>> be a range?
> 
> That would add a cost for checking that one doesn't want to pay in
> algorithms that need this method.

I'm a little uncomfortable with the way `limit`'s interpretation changes based on the sign of `n` here; a stray arithmetic error could turn a maximum into a minimum, with potentially bizarre results. In the code, it looks like we end up branching to accommodate that interpretation, too, which isn't ideal.

I might feel a little better with a variant for each limit semantic:

	func index(n: IndexDistance, stepsFrom: Index, belowLimit: Index) -> Index				// for when you expect to move up
	func index(n: IndexDistance, stepsFrom: Index, aboveLimit: Index) -> Index			// for when you expect to move down
	func index(n: IndexDistance, stepsFrom: Index, withinLimits: Range<Index>) -> Index	// for when you don't know which direction you're going

>> 3. What is the use case for returning the `limit` instead of returning
>> the fact that we crossed it? I have a hard time thinking of a case
>> where I would want to just bump up against the limit and use it rather
>> than *detect* that I've hit the limit (which would probably call for a
>> return type of `Index?`). Are there common needs that I'm just not
>> thinking of? 
> 
> Sure, for example
> 
>  x[i..<x.index(n, stepsFrom: i, limitedBy: x.endIndex)].sort()

Let me restate that. Given that Swift doesn't usually seem to truncate ranges that are too long (though there are exceptions), are we more frequently going to want to stop at the limit, or to detect that the limit has been hit?

>> Should we offer both?
> 
> Definitely not, IMO!  They are utterly redundant, are they not?

Well, an Optional-returning `index(_:stepsFrom:limitedBy:)` can be treated as a truncating version pretty easily. For instance, `prefix(_:)` can do this:

	let end = index(numericCast(maxLength), stepsFrom: startIndex, limitedBy: endIndex) ?? endIndex
	return self[startIndex..<end]

This is a bit of a hassle, and might be less efficient (an optional return, an extra test), but I don't think you can reliably go in the other direction.

(On the other hand, it might be that I'm conceiving of the purpose of `limitedBy` differently from you—I think of it as a safety measure, but you may be thinking of it specifically as an automatic truncation mechanism.)

-- 
Brent Royal-Gordon
Architechies