[swift-evolution] [SHORT Review] SE-0132: Rationalizing Sequence end-operation names

Brent Royal-Gordon brent at architechies.com
Tue Jul 26 07:56:57 CDT 2016


> On Jul 25, 2016, at 6:35 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
> 
> First, though, I have to apologize for those wide tables, since I'm
> listed as a co-author (because of a small design contribution).  The
> only way I've been able to read them is by looking at the markdown
> source, so that's how I'm going to quote it here.

Sorry about that. I felt that the systematic way the names were arranged couldn't be conveyed in any other way, but the resulting formatting is regrettably difficult to read.

> Unfortunately there's a semantic difference that I hadn't noticed
> before: the mutating “remove” operations have a precondition that there
> be at least as many elements as are being removed.  “Drop,” like “pop,”
> is forgiving of such overruns.  I think this is solvable; my suggestion
> is below

Wow, there's a lot going on in this next section.

To preface this discussion, I'll merely point out that, in writing this proposal, I tried to narrowly focus on renaming. My inability to rename `prefix(upTo/through:)` and `suffix(from:)` in a way that made sense led me to a more aggressive redesign, but for the other calls I considered larger redesigns out of scope. I say this not as an argument against redesigning the `remove` calls, but merely to explain why I didn't fiddle with their semantics already.

To address things one at a time:

> My suggestion would be to make the remove()
> operations more forgiving:
> 
>   rename popFirst() to removeFirst()
>   rename popLast() to removeLast()

I actually quite agree that `removeFirst/Last()` and `popFirst/Last()` are redundant. I think I briefly touched on that in "Future Directions".

Going out of order for a moment:

> We could of course just make
> removePrefix(n) and removeSuffix(n) forgiving,

If we're going to go beyond renaming, this is the approach I favor, because it is the most straightforward and readable. When you see a piece of code that says `removePrefix(n)`, it doesn't take much effort to figure out that it's removing a series of elements from the beginning of the sequence.

Specifically, if we go this route, I would:

* Remove the `popFirst/Last` methods.
* Remove the preconditions on removeFirst/Last/Prefix/Suffix requiring the collection to not be smaller than the number of elements to be removed.
* Change the signatures to something like:

	@discardableResult
	func removeFirst() -> Iterator.Element?
	@discardableResult
	func removeLast() -> Iterator.Element?
	@discardableResult
	func removePrefix(_ n: Int) -> [Iterator.Element]	// or SubSequence
	@discardableResult
	func removeSuffix(_ n: Int) -> [Iterator.Element]	// or SubSequence

I've added return types to `removePrefix/Suffix` so that they follow our general rule about not throwing away information that isn't necessarily easy to compute, but that part of the change isn't strictly necessary.

> but I have long believed
> that the “prefix/suffix” methods should go one of two ways:
> 
> a. they get a label that clarifies the meaning of the argument, e.g.
> 
>   x.removePrefix(ofMaxLength: n)
>   x.removeSuffix(ofMaxLength: n)

I struggled for a while to find a good label for the count parameters, but couldn't find a good convention.

The fundamental problem is that English counting grammar doesn't line up with Swift syntax. In English, you would like to say "prefix of 3 elements", but you can't put a label after a parameter in Swift. Thus, there may not be an idiomatic way to label these parameters. (This is a problem throughout the standard library, actually; I ran into the same issues when I was looking at the UnsafeRawPointer proposal.)

Labels like `ofMaxLength` are verbose, and the use of "length" in particular isn't great when the standard library otherwise uses "count". If you put a gun to my head and demanded I add a label, I would make it `x.removePrefix(ofUpTo: n)` or just `x.removePrefix(of: n)`. But I'm not convinced any of these bring enough to the table. When you see an unlabeled Int parameter to a call with a name like `removePrefix`, there's really only a couple of things it could mean, and the count is an unsurprising one.

On the other hand, the internal parameter name should probably not be `n`; it should be something like `count` or even `countToRemove`.

(Probably obvious, but worth mentioning explicitly: if we add a label, we should apply it consistently to all `prefix` and `suffix` calls which take a count.)

Okay, back to the right order:

>   kill removeFirst(n)
>   kill removeLast(n)
> 
> The “forgiving” forms of x.removeFirst(n) and x.removeLast(n) can be
> expressed as:
> 
>   let i = x.index(x.startIndex, offsetBy: n, limitedBy: x.endIndex)
>   x.removeSubrange(..<i)
> 
>   let i = x.index(x.endIndexIndex, offsetBy: -n, limitedBy: x.startIndex)
>   x.removeSubrange(i..<)
> 
> I realize that's quite verbose.


No kidding. It requires:

* Two statements, four references to the collection, and 88 characters (for an example with one-letter variable names)
* The use of a complex index-manipulation function
* The simultaneous use of the startIndex, endIndex, and a sign on the count, all of which must be coordinated

I think that, if `removePrefix/Suffix(_:)` were not in the standard library, people would be forced to invent them.

> b. they are given a recognizable domain-specific notation such as:
> 
>   x.removeSubrange($+n..<)
>   x.removeSubrange(..<$-n)

Does $ represent the start, the end, or either one depending on which side of the range we're on? Because if it's the third option, I think these two operations are actually inverted: the first is removing everything *except* the `prefix(n)`, and the second is removing everything except the `suffix(n)`.

(Or maybe it's based on the sign, with a negative offset going from the endIndex, and a positive offset going from the startIndex? Don't we usually try to avoid that kind of runtime branching?)

>  That would admittedly leave single-pass Sequences without an API for
>  dropping the first N elements. I am inclined to think that interface
>  should be moved to Iterator.

Why would it be a good thing to use two different syntaxes for the same operation?

>  The introduction of such notation raises the question of whether we
>  need unary range operators, and could instead go with
> 
>     x[i..<$] and x[$..<i]
> 
>  which is after all only one character longer than
> 
>     x[i..<] and x[..<i]
> 
>  and stays out of the territory of the prefix/suffix “pack/unpack”
>  operators that are likely to be used for generic variadics.
…
> I believe the `$+n..<i` idea is still implementable with these basic
> types, just with an enum instead of optionals.  I'll take a shot at it
> tonight if I can get a few minutes.

Xcode 8 beta 3-compatible hack based on my assumptions about how this supposed to work, suitable for at least testing the ergonomics: https://gist.github.com/brentdax/3c5c64d3b7ca3ff6b68f1c86163c39c4

At a technical level, I think the biggest problem is the `$` symbol. If it's a constant or variable, then it can't be generic, and we need to fix a specific SignedInteger type for the offset. That means we need to convert to whatever IndexDistance ends up being. That could be fixed if either `$` became a magic syntax or we get generic constants, but either of those will be a significant effort. Or it can be a function, but then it's less convenient.

Stepping back from implementation, though, I think the `$` syntax is just too opaque. It gets better when it's spaced out:

	array[$ + 2 ..< $ - 1]

But it's still not something whose meaning you will even make a wild guess at without looking it up. Even once you learn it, the $ symbol will almost always be squeezed between two other punctuation characters, making visual parsing more difficult. The effect is not unlike stereotypes of Perl.

(Note: I'm an old Perl hand, so others might better read that sentence as "The effect is not unlike Perl.")

I mean, look at this example you gave earlier:

>   It also implies we can replace
> 
>     x.removingPrefix(n)
>     x.removingSuffix(n)
> 
>   with
> 
>     x[$+n..<]
>     x[..<$-n]
> 
>  for Collections.  

The first version is clear as crystal; the second is clear as mud. The first version gives the parameter a position of prominence; the second buries it in the middle of a complex expression.

Now, abandoning the `$` could help here. I've worked up an alternate design which uses descriptive names at <https://gist.github.com/brentdax/0946a99528f6e6500d93bbf67684c0b3>. The example I gave above is instead written:

	array[.startIndex + 2 ..< .endIndex - 1]

Or, for the removingPrefix equivalent:

	x[.startIndex + n ..< .endIndex]

But this is longer than a removingPrefix call and *still* buries the intent. And adding prefix/suffix operators back in doesn't really help; it merely allows you to omit the straightforward bits, without doing anything to help make the non-straightforward bits more clear. (And in fact, it makes those worse by forcing you to either compress the whitespace in them or parenthesize.)

And even if we get past the syntax issues, there's an attractive nuisance problem. This feature, whatever it is, will undoubtedly circulate as "the way to make String indexes work". Is that something we want to bring to the language?

In short, I think the idea of relative ranges is pretty neat. It's brief, it's parsimonious, it's flexible, it's clever. But it makes code less clear and it hides performance problems. I think either of those two factors by itself ought to make us rethink a Swift proposal.

-- 
Brent Royal-Gordon
Architechies



More information about the swift-evolution mailing list