[swift-evolution] [Proposal draft] Introducing `indexed()` collections

Mon Oct 3 19:59:32 CDT 2016

On Mon, Oct 3, 2016, at 02:51 PM, Dave Abrahams via swift-evolution wrote:
> 
> on Mon Oct 03 2016, Kevin Ballard <swift-evolution at swift.org> wrote:
> 
> > On Fri, Sep 30, 2016, at 08:53 PM, Dave Abrahams via swift-evolution wrote:
> >> 
> >> on Wed Sep 28 2016, Erica Sadun <swift-evolution at swift.org> wrote:
> >> 
> >> > Indices have a specific, fixed meaning in Swift, which are used to create valid collection
> >> > subscripts. This proposal introduces indexed() to produce a more semantically relevant sequence
> > by
> >
> >> > pairing a collection's indices with its members. While it is trivial to create a solution in Swift,
> >> > the most common developer approach shown here calculates indexes twice:
> >> >
> >> > extension Collection {
> >> >     /// Returns a sequence of pairs (*idx*, *x*), where *idx* represents a
> >> >     /// consecutive collection index, and *x* represents an element of
> >> >     /// the sequence.
> >> >     func indexed() -> Zip2Sequence<Self.Indices, Self> {
> >> >         return zip(indices, self)
> >> >     }
> >> > }
> >> 
> >> How does this calculate indices twice?
> >
> > It calculates indices twice for any collection that uses
> > IndexingIterator as its iterator. 
> 
> Yes.  Not in general; just in that particular case.
> 
> > And for collections that doesn't, it still does the moral equivalent,
> > because it's calculating an index offset along with whatever work the
> > Iterator does to calculate the next element.
> 
> Indexing is supposed to be cheap; almost free.  Lazy filtered
> collections are an anomaly.  They're arguably not even legal
> Collections, because advancing an index may not be O(1).  They exist
> because they're useful, but you shouldn't pass them out without
> understanding the consequences.

Using an index is supposed to be cheap/free. Calculating the next index is not guaranteed to be so. If you want another example of something that's not lazy, try String.CharacterView. Calculating the next index may be arbitrarily complex since I can string as many combining marks together as I want, though in practice it will be pretty cheap. But even this "pretty cheap" is still work, and depending on what I'm doing in the loop, calculating character indexes may be a significant fraction of the work performed.

> > As an example, if my collection is `someArray.lazy.filter(…)` then
> > zip(col.indices, col) will run the filter twice over the collection.
> 
> Okay.
> 
> >> > Incrementing an index in some collections can be unnecessarily
> >> > costly. 
> >> 
> >> Seems like it's only *unnecessarily* costly in badly implemented
> >> collections?
> >
> > A collection doesn't have to be badly-implemented to have a
> > non-trivial cost for calculating the next element. 
> > As above, someArray.lazy.filter(…) is a good example of such a
> > collection.
> 
> Its conformance to Collection is quite sketchy.

But it's not the only collection where calculating indexes is non-trivial.

-Kevin Ballard

> >> > In a lazy filtered collection, an index increment is potentially
> >> > O(N). We feel this is better addressed introducing a new function into
> >> > the Standard Library to provide a more efficient design that avoids
> >> > the attractive nuisance of the "obvious" solution.
> >> 
> >> I am generally opposed to adding this.  The usual solution developers
> >> will reach for here is:
> >> 
> >>     for i in x.indices {
> >>         somethingWith(x[i])
> >>     }
> >> 
> >> zip(indices, self) is only suboptimal for lazy filtered sequences, which
> >> should be used with care anyhow (see the note here:
> >> http://swiftdoc.org/v3.0/type/LazyFilterCollection/).
> >
> > It's suboptimal for any collection with a non-trivial index.
> 
> Which should be an exceedingly rare thing.
> 
> >> If you really need a lazy sequence of pairs that's optimal with lazy
> >> filtered sequences, 
> >> 
> >>     x.indices.lazy.map { ($0, x[$0]) }
> >> 
> >> is a good solution and pretty easy to write.
> >
> > And yet people will write zip(x.indices, x) instead because it's
> > shorter and not immediately obvious that it may be suboptimal
> > depending on the collection.
> >
> > Why are you opposed to adding this? 
> 
> Mostly because it's additional API complexity, the usefulness appears to
> be marginal, and it's optimizing for what should be a rare corner case
> (collections that don't conform to efficiency expectations).
> 
> I'm not dead-set against adding it, but ATM it doesn't seem like there
> are important use-cases that will benefit substantially from having it.
> Convince me this is addressing a real need, and we'll talk.  In phase 2
> :-).
> 
> > The ability to work around its lack doesn't mean it doesn't have
> > value, and the fact that the simplest workaround is not the best one
> > is I think a good reason to make the easiest solution into the best
> > one (by providing .indexed()). Add to that the fact that a lot of
> > people probably use .enumerated() to produce indexes when working with
> > arrays, and this is a pitfall when working with other types, such as
> > ArraySlice which still has Int indexes but is no longer zero-based.
> 
> I am also concerned about introducing confusion around the difference
> from enumerated().  These methods will have identical semantics for
> Array.