[swift-evolution] Proposal: Python's indexing and slicing

Mon Dec 21 15:51:09 CST 2015

On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:
>
>> On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution <swift-
>> evolution at swift.org> wrote:
>>
>> On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-
>> evolution wrote:
>>>
>>> Yes, we already have facilities to do most of what Python can do
>>> here, but one major problem IMO is that the “language” of slicing is
>>> so non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and
>>> suffix.  Introducing “$” for this purpose could make it all hang
>>> together and also eliminate the “why does it have to be so hard to
>>> look at the 2nd character of a string?!” problem.  That is, use the
>>> identifier “$” (yes, that’s an identifier in Swift) to denote the
>>> beginning-or-end of a collection.  Thus,
>>>
>>> c[c.startIndex.advancedBy(3)] =>c[$+3]        // Python: c[3]  c[c.endIndex.advancedBy(-
>>> 3)] =>c[$-3]        // Python: c[-3]
>>>
>>> c.dropFirst(3)  =>c[$+3...]     // Python: c[3:]  c.dropLast(3) =>c[..<$-
>>> 3]     // Python: c[:-3]  c.prefix(3) =>c[..<$+3]     // Python:
>>> c[:3]  c.suffix(3) => c[$-3...]     // Python: c[-3:]
>>>
>>> It even has the nice connotation that, “this might be a little more
>>> expen$ive than plain indexing” (which it might, for non-random-
>>> access collections).  I think the syntax is still a bit heavy, not
>>> least because of “..<“ and “...”, but the direction has potential.
>>>
>>> I haven’t had the time to really experiment with a design like this;
>>> the community might be able to help by prototyping and using some
>>> alternatives.  You can do all of this outside the standard library
>>> with extensions.
>>
>> Interesting idea.
>>
>> One downside is it masks potentially O(N) operations
>> (ForwardIndex.advancedBy()) behind the + operator, which is typically
>> assumed to be an O(1) operation.
>
> Yeah, but the “$” is sufficiently unusual that it doesn’t bother me
> too much.
>
>> Alos, the $+3 syntax suggests that it requires there to be at least 3
>> elements in the sequence, but prefix()/suffix()/dropFirst/etc. all
>> take maximum counts, so they operate on sequences of fewer elements.
>
> For indexing, $+3 would make that requirement.  For slicing, it
> wouldn’t.  I’m not sure why you say something about the _syntax_
> suggests exceeding bounds would be an error.

Because there's no precedent for + behaving like a saturating addition,
not in Swift and not, to my knowledge, anywhere else either. The closest
example that comes to mind is floating-point numbers eventually ending
up at Infinity, but that's not really saturating addition, that's just a
consequence of Infinity + anything == Infinity. Nor do I think we should
be establishing precedent of using + for saturating addition, because
that would be surprising to people. Additionally, I don't think adding a
$ to an array slice expression should result in a behavioral difference,
e.g. array[3..<array.endIndex] and array[$+3..<$] should behave the same

>> There's also some confusion with using $ for both start and end. What
>> if I say c[$..<$]? We'd have to infer from position that the first $
>> is the start and the second $ is the end, but then what about
>> c[$+n..<$+m]? We can't treat the usage of + as meaning "from start"
>> because the argument might be negative. And if we use the overall
>> sign of the operation/argument together, then the expression `$+n`
>> could mean from start or from end, which comes right back to the
>> problem with Python syntax.
>
> There’s a problem with Python syntax?  I’m guessing you mean that
> c[a:b] can have very different interpretations depending on whether a
> and b are positive or negative?

Exactly.

> First of all, I should say: that doesn’t really bother me.  The 99.9%
> use case for this operation uses literal constants for the offsets,
> and I haven’t heard of it causing confusion for Python programmers.
> That said, if we wanted to address it, we could easily require n and m
> above to be literals, rather than Ints (which incidentally guarantees
> it’s an O(1) operation).  That has upsides and downsides of course.

I don't think we should add this feature in any form if it only
supports literals.

>>
>> I think Jacob's idea has some promise though:
>>
>> c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
>> c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]
>
>> But naming the slice operations is a little trickier. We could
>> actually just go ahead and re-use the existing method names
>> for those:
>>
>> c.dropFirst(3) => c[dropFirst: 3]
>> c.dropLast(3) => c[dropLast: 3]
>> c.prefix(3) => c[prefix: 3]
>> c.suffix(3) => c[suffix: 3]
>>
>> That's not so compelling, since we already have the methods, but I
>> suppose it makes sense if you want to try and make all slice-
>> producing methods use subscript syntax (which I have mixed feelings
>> about).
>
> Once we get efficient in-place slice mutation (via slice addressors),
> it becomes a lot more compelling, IMO.  But I still don’t find the
> naming terribly clear, and I don’t love that one needs to combine two
> subscript operations in order to drop the first and last element or
> take just elements 3..<5.

You can always add more overloads, such as

c[dropFirst: 3, dropLast: 5]

but I admit that there's a bunch of combinations here that would need
to be added.

My concern over trying to make it easier to take elements 3..<5 is that
incrementing indexes is verbose for a reason, and adding a feature that
makes it really easy to index into any collection by using integers is a
bad idea as it will hide O(N) operations behind code that looks like
O(1). And hiding these operations makes it really easy to accidentally
turn an O(N) algorithm into an O(N^2) algorithm.

> Even if we need separate symbols for “start” and “end” (e.g. using “$”
> for both might just be too confusing for people in the end, even if it
> works otherwise), I still think a generalized form that allows ranges
> to be used everywhere for slicing is going to be much easier to
> understand than this hodgepodge of words we use today.

I'm tempted to say that if we do this, we should use two different
sigils, and more importantly we should not use + and - but instead use
methods on the sigils like advancedBy(), as if the sigils were literally
placeholders for the start/end index. That way we won't write code that
looks O(1) when it's not. For example:

col[^.advancedBy(3)..<$]

Although we'd need to revisit the names a little, because $.advancedBy(-
3) is a bit odd when we know that $ can't ever take a non-negative
number for that.

Or maybe we should just use $ instead as a token that means "the
collection being indexed", so you'd actually say something like

col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]

This solves the problem of subscripting a collection without having to
store it in a local variable, without discarding any of the intentional
index overhead. Of course, if the goal is to make index operations more
concise this doesn't really help much, but my argument here is that it's
hard to cut down on the verbosity without hiding O(N) operations.

-Kevin Ballard

>> But the [fromStart:] and [fromEnd:] subscripts seem useful.
> Yeah… I really want a unified solution that covers slicing as well as
> offset indexing.
>
> -Dave
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20151221/5cb83f52/attachment.html>