[swift-evolution] [Proposal draft] Introducing `indexed()` collections

Mon Oct 3 16:51:28 CDT 2016

on Mon Oct 03 2016, Kevin Ballard <swift-evolution at swift.org> wrote:

> On Fri, Sep 30, 2016, at 08:53 PM, Dave Abrahams via swift-evolution wrote:
>> 
>> on Wed Sep 28 2016, Erica Sadun <swift-evolution at swift.org> wrote:
>> 
>> > Indices have a specific, fixed meaning in Swift, which are used to create valid collection
>> > subscripts. This proposal introduces indexed() to produce a more semantically relevant sequence
> by
>
>> > pairing a collection's indices with its members. While it is trivial to create a solution in Swift,
>> > the most common developer approach shown here calculates indexes twice:
>> >
>> > extension Collection {
>> >     /// Returns a sequence of pairs (*idx*, *x*), where *idx* represents a
>> >     /// consecutive collection index, and *x* represents an element of
>> >     /// the sequence.
>> >     func indexed() -> Zip2Sequence<Self.Indices, Self> {
>> >         return zip(indices, self)
>> >     }
>> > }
>> 
>> How does this calculate indices twice?
>
> It calculates indices twice for any collection that uses
> IndexingIterator as its iterator. 

Yes.  Not in general; just in that particular case.

> And for collections that doesn't, it still does the moral equivalent,
> because it's calculating an index offset along with whatever work the
> Iterator does to calculate the next element.

Indexing is supposed to be cheap; almost free.  Lazy filtered
collections are an anomaly.  They're arguably not even legal
Collections, because advancing an index may not be O(1).  They exist
because they're useful, but you shouldn't pass them out without
understanding the consequences.

> As an example, if my collection is `someArray.lazy.filter(…)` then
> zip(col.indices, col) will run the filter twice over the collection.

Okay.

>> > Incrementing an index in some collections can be unnecessarily
>> > costly. 
>> 
>> Seems like it's only *unnecessarily* costly in badly implemented
>> collections?
>
> A collection doesn't have to be badly-implemented to have a
> non-trivial cost for calculating the next element. 
> As above, someArray.lazy.filter(…) is a good example of such a
> collection.

Its conformance to Collection is quite sketchy.

>> > In a lazy filtered collection, an index increment is potentially
>> > O(N). We feel this is better addressed introducing a new function into
>> > the Standard Library to provide a more efficient design that avoids
>> > the attractive nuisance of the "obvious" solution.
>> 
>> I am generally opposed to adding this.  The usual solution developers
>> will reach for here is:
>> 
>>     for i in x.indices {
>>         somethingWith(x[i])
>>     }
>> 
>> zip(indices, self) is only suboptimal for lazy filtered sequences, which
>> should be used with care anyhow (see the note here:
>> http://swiftdoc.org/v3.0/type/LazyFilterCollection/).
>
> It's suboptimal for any collection with a non-trivial index.

Which should be an exceedingly rare thing.

>> If you really need a lazy sequence of pairs that's optimal with lazy
>> filtered sequences, 
>> 
>>     x.indices.lazy.map { ($0, x[$0]) }
>> 
>> is a good solution and pretty easy to write.
>
> And yet people will write zip(x.indices, x) instead because it's
> shorter and not immediately obvious that it may be suboptimal
> depending on the collection.
>
> Why are you opposed to adding this? 

Mostly because it's additional API complexity, the usefulness appears to
be marginal, and it's optimizing for what should be a rare corner case
(collections that don't conform to efficiency expectations).

I'm not dead-set against adding it, but ATM it doesn't seem like there
are important use-cases that will benefit substantially from having it.
Convince me this is addressing a real need, and we'll talk.  In phase 2
:-).

> The ability to work around its lack doesn't mean it doesn't have
> value, and the fact that the simplest workaround is not the best one
> is I think a good reason to make the easiest solution into the best
> one (by providing .indexed()). Add to that the fact that a lot of
> people probably use .enumerated() to produce indexes when working with
> arrays, and this is a pitfall when working with other types, such as
> ArraySlice which still has Int indexes but is no longer zero-based.

I am also concerned about introducing confusion around the difference
from enumerated().  These methods will have identical semantics for
Array.

-- 
-Dave