[swift-evolution] [Review] SE-0065 A New Model for Collections and Indices

Tue Apr 12 06:27:37 CDT 2016

>> (On the other hand, it might be that I'm conceiving of the purpose of `limitedBy` differently from you—I think of it as a safety measure, but you may be thinking of it specifically as an automatic truncation mechanism.)
> 
> Hi Brent,
> 
> Could you explain what kind of safety do you have in mind?  Swift will
> guarantee memory safety even if you attempt to advance an index past
> endIndex using the non-limiting overload.

By "safety" here, I mean what I will call "index safety": not accidentally using an index which would violate the preconditions of the methods or properties you are planning to use it with. I think it's too easy to accidentally overrun the permitted range of indices, and the API should help you avoid doing that.

For instance, suppose I'm porting XCTest to Swift, and I decide to rewrite its `demangleSimpleClass` function, which extracts the identifiers from a mangled Swift symbol name. Specifically, I'm implementing `scanIdentifier`, which reads one particular identifier out of the middle of a string. (For those unfamiliar: an identifier in a mangled symbol name consists of one or more digits to represent a length, followed by that many characters.) I will assume that the mangled symbol name is in a Swift.String.

Here's a direct port:

	func scanIdentifier(partialMangled: String) -> (identifier: String, remainder: String) {
		let chars = partialMangled.characters
		var lengthRange = chars.startIndex ..< chars.startIndex

		while chars[lengthRange.endIndex].isDigit {
			lengthRange.endIndex = chars.successor(of: lengthRange.endDigit)
		}

		let lengthString = String(chars[lengthRange])
		let length = Int(lengthString)!

		let identifierRange = lengthRange.endIndex ..< chars.index(length, stepsFrom: lengthRange.endIndex)
		let remainderRange = chars.suffix(from: identifierRange.endIndex)

		return (String(chars[identifierRange]), String(chars[identifierRange]))
	}

This works (note: probably, I haven't actually tested it), but it fails a precondition if the mangled symbol is invalid. Suppose we want to detect this condition so that our parent function can throw a nice error instead:

	func scanIdentifier(partialMangled: String) -> (identifier: String, remainder: String)? {
		let chars = partialMangled.characters
		var lengthRange = chars.startIndex ..< chars.startIndex

		while chars[lengthRange.endIndex].isDigit {
			lengthRange.endIndex = chars.successor(of: lengthRange.endDigit)
			if lengthRange.endIndex == chars.endIndex {
				return nil
			}
		}

		let lengthString = String(chars[lengthRange])
		guard let length = Int(lengthString) else {
			return nil
		}

		let identifierRange = lengthRange.endIndex ..< chars.index(length, stepsFrom: lengthRange.endIndex)
		if identifierRange.endIndex > chars.endIndex {
			return nil
		}

		let remainderRange = chars.suffix(from: identifierRange.endIndex)

		return (String(chars[identifierRange]), String(chars[identifierRange]))
	}

That's really not the greatest. To tell the truth, I've actually guessed what bounds-checking is needed here; I'm not 100% sure I caught all the cases. And, um, I'm not really sure that `index(length, stepsFrom: lengthRange.endIndex)` is guaranteed to return anything valid if `length` is too large. Even `limitedBy:` wouldn't help me here—I would end up silently accepting and truncating an invalid string instead of detecting the error.

Now, imagine if `successor(of:)` and `index(_:stepsFrom:)` instead had variants which performed range checks on their results and returned `nil` if they failed:

	func scanIdentifier(partialMangled: String) -> (identifier: String, remainder: String)? {
		let chars = partialMangled.characters
		var lengthRange = chars.startIndex ..< chars.startIndex

		while chars[lengthRange.endIndex].isDigit {
			guard let nextIndex = chars.successor(of: lengthRange.endDigit, permittingEnd: false) else {
				return nil
			}
			lengthRange.endIndex = nextIndex
		}

		let lengthString = String(chars[lengthRange])
		guard let length = Int(lengthString) else {
			return nil
		}

		guard let identifierEndIndex = chars.index(length, stepsFrom: lengthRange.endIndex, permittingEnd: true) else {
			return nil
		}

		let identifierRange = lengthRange.endIndex ..< identifierEndIndex
		let remainderRange = chars.suffix(from: identifierRange.endIndex)

		return (String(chars[identifierRange]), String(chars[identifierRange]))
	}

By using these variants of the index-manipulation operations, the Collection API itself tells me where I need to handle bounds-check violations. Just like the failable `Int(_: String)` initializer, if I forget to check bounds after manipulating an index, the code will not type-check. That's a nice victory for correct semantics.

* * *

Incidentally, rather than having Valid<Index>, an alternative would be to have Unchecked<Index>. This would mark an index which had *not* been checked. You could use its `uncheckedIndex` property to access the index directly, or you could pass it to `Collection.check(_: Unchecked<Index>) -> Index?` to perform the check.

This would not serve to eliminate redundant checks; it would merely get the type system to help you catch index-checking mistakes. You could, of course, perform the check and then invalidate the index with a mutation, but that's just as true today. I believe that, with aggressive enough optimization, this could be costless at runtime. *And* it would offer a way to provide the so-called "safe indexing" many people ask for: you could offer a subscript which took an Unchecked<Index> and returned an Optional<Element>.

-- 
Brent Royal-Gordon
Architechies