[swift-evolution] [Pitch] Remove destructive consumption from Sequence

Sat Jul 2 07:09:59 CDT 2016

Sent from my iPad

> On Jul 1, 2016, at 9:55 PM, Dave Abrahams <dabrahams at apple.com> wrote:
> 
> 
> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:
> 
>>> On Jul 1, 2016, at 6:32 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>> 
>>> 
>>> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:
>>> 
>> 
>>>>> On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>> wrote:
>>>>> 
>>>>> 
>>>>> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com
>>>>> <http://matthew-at-anandabits.com/>
>>>>> <http://matthew-at-anandabits.com/
>>>>> <http://matthew-at-anandabits.com/>>> wrote:
>>>>> 
>>>> 
>>>>>>> On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>>
>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me <http://swift-evolution-at-haravikk.me/>> wrote:
>>>>>>> 
>>>>>> 
>>>>>>>>> On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>>>>>> 
>>>>>>>>> Swift is a language that embraces value semantics.  Many common
>>>>>>>>> iterators *can* be implemented with value semantics.  Just because we
>>>>>>>>> can’t implement *all* iterators with value semantics doesn’t mean we
>>>>>>>>> should require them to have reference semantics.  It just means you
>>>>>>>>> can’t *assume* value semantics when working with iterators in generic
>>>>>>>>> code unless / until we have a way to specify a value semantics
>>>>>>>>> constraint.  That’s not necessarily a bad thing especially when it
>>>>>>>>> leaves the door open to interesting future possibilities.
>>>>>>>>> 
>>>>>>>>> -Matthew
>>>>>>>> 
>>>>>>>> I'm kind of undecided about this personally. I think one of the
>>>>>>>> problems with Swift is that the only indication that you have a
>>>>>>>> reference type is that you can declare it as a constant, yet still
>>>>>>>> call mutating methods upon it, this isn't a very positive way of
>>>>>>>> identifying it however. This may be more of a GUI/IDE issue though, in
>>>>>>>> that something being a class isn't always that obvious at a glance.
>>>>>>>> 
>>>>>>>> I wonder, could we somehow force iterators stored in variables to be
>>>>>>>> passed via inout? This would make it pretty clear that you're using
>>>>>>>> the same iterator and not a copy in all cases, encouraging you to
>>>>>>>> obtain another if you really do need to perform multiple passes.
>>>>>>> 
>>>>>>> I'm going to push single-pass iteration on the stack briefly and talk
>>>>>>> about the topic that's been under discussion here: infinite multipass
>>>>>>> sequences.
>>>>>>> 
>>>>>>> ## Fitting “Infinite Multipass” Into the Model
>>>>>>> 
>>>>>>> It remains to be decided whether it's worth doing, but if it's to
>>>>>>> happen
>>>>>> 
>>>>>> I definitely think it’s worth doing.  
>>>>> 
>>>>> Opinions are nice, but rationales are better.  How will we understand
>>>>> *why* it's worth doing?
>>>> 
>>>> I agree.  
>>>> 
>>>> The rationale has been discussed quite a bit already in this thread.
>>>> The current protocols do not provide the semantics many people are
>>>> assuming in their code, leading to a lot of code that is incorrect
>>>> despite the fact that it usually works in practice.
>>>> 
>>>> This is especially frequent in the case of the finite assumption.
>>>> This assumption is so common it seems very wise to me to encode it as
>>>> a semantic requirement in a protocol.
>>>> 
>>>> IMO these are problem worth addressing, especially now that we have a
>>>> good handle on what a solution would look like.
>>>> 
>>>>> 
>>>>>> I really appreciate the attention that the library team has given to
>>>>>> this.
>>>>>> 
>>>>>>> , the standard library team thinks the right design is roughly
>>>>>>> this:
>>>>>>> 
>>>>>>> /// A multipass sequence that may be infinite
>>>>>>> protocol Collection {
>>>>>>> 
>>>>>>> // Only eager algorithms that can terminate available here
>>>>>>> func index(where predicate: (Element)->Bool) -> Index
>>>>>>> 
>>>>>>> // all lazy algorithms available here
>>>>>>> var lazy: ...
>>>>>>> 
>>>>>>> var startIndex: Index
>>>>>>> var endIndex: Index // possibly not reachable from startIndex
>>>>>>> 
>>>>>>> associatedtype SubSequence : Collection
>>>>>>> // do we need an associated FiniteSubsequence, e.g. for prefixes?
>>>>>>> }
>>>>>>> 
>>>>>>> protocol FiniteCollection : Collection {
>>>>>>> 
>>>>>>> // All eager algorithms available here
>>>>>>> func map(...) ->
>>>>>>> var count: ...
>>>>>>> }
>>>>>>> 
>>>>>>> protocol BidirectionalCollection : Collection { ... }
>>>>>>> 
>>>>>>> protocol RandomAccessCollection : BidirectionalCollection { … }
>>>>>> 
>>>>>> Does this design entirely break with the relationship between
>>>>>> collections and iterators (dropping `makeIterator` as a protocol
>>>>>> requirement)?  If so, would for..in (over collections) be built on top
>>>>>> of indices and use `formIndex(after:)`?  Would it require a finite
>>>>>> collection (unless we add `until` to the language and then allow
>>>>>> `for..in..until` to work with infinite collections)?
>>>>> 
>>>>> All of these points are up for discussion.  
>>>> 
>>>> Cool.  I think the collection for..in has some nice advantages, but
>>>> since it sounds like we’ll probably keep Iterator around it might be
>>>> best to take the approach of making them both work.
>>>> 
>>>> You already know that I would prefer to see the current for..in built
>>>> on finite sequences and allow for..in..unitl to be used with infinite
>>>> sequences if we add that in the future.  :)
>>>> 
>>>>> John McCall pointed out to
>>>>> me that an index-based for..in would make it possible to implement
>>>>> 
>>>>> for inout x in y { mutate(&x) }
>>>> 
>>>> That would be very nice!
>>>> 
>>>> I think it might also increase performance.  I don’t know exactly how
>>>> for..in is implemented today, but the implementation of
>>>> IndexingIterator compares position to endIndex.  If for..in is also
>>>> comparing checking the optional for nil that’s an extra comparison.
>>>> We shouldn't need to actually construct the optional in the first
>>>> place using an index-based for..in.  Maybe optimizations like this
>>>> already exist?  But even if they do, it seems like they wouldn’t be
>>>> possible in some cases where the type of the sequence isn’t statically
>>>> known.
>>>> 
>>>>> 
>>>>>> Would we still retain `IndexingIterator`even if we break the
>>>>>> relationship in the protocol requirements?
>>>>> 
>>>>> Yes: it should be possible to implement Collection algorithms in terms
>>>>> of Iterator algorithms, and IndexingIterator provides the means.  That
>>>>> said, I think the makeIterator requirement does little harm, especially
>>>>> when it can be defaulted for Collections.
>>>> 
>>>> I like this answer.
>>>> 
>>>>> 
>>>>>> Would it still be possible to do things like zip a multi-pass sequence
>>>>>> with a single-pass sequence (assuming we keep single-pass sequences or
>>>>>> add them back eventually)?  This seems like a use case worth
>>>>>> supporting in some way.
>>>>> 
>>>>> Yes.  If you can create an Iterator from a Collection, and you can zip
>>>>> Iterators, you can do this.
>>>> 
>>>> Yes, of course.  I’m glad we would keep this relationship in tact.
>>>> 
>>>>> 
>>>>>> One subtle change I think this implies is that things like
>>>>>> `LazyFilterSequence` can implement `makeIterator` with constant
>>>>>> complexity, deferring the O(N) complexity to the first call to `next`.
>>>>> 
>>>>> I don't believe that's a difference, though I could be wrong.
>>>> 
>>>> You’re right, I was wrong.  `LazyFilterSequence` just constructs an
>>>> iterator and returns it.  `LazyFilterCollection` has to loop until it
>>>> finds the first item matching the predicate in its `startIndex`
>>>> implementation.  The part I was missing is that `IndexingIterator`
>>>> gets the `startIndex` in its initializer.
>>>> 
>>>>> 
>>>>>> `startIndex` for `LazyFilterCollection` currently has O(N) complexity.
>>>>>> The complexity of a complete iteration doesn’t change and probably
>>>>>> isn’t a big deal, but it’s worth noting.
>>>>> 
>>>>> Filtered collection views always require a bit of hand-waving around
>>>>> performance guarantees; I don't think that changes.
>>>>> 
>>>>>> I’ve been looking at some code that wraps a sequence and considering
>>>>>> how it would be impacted.  With iterators it looks like this:
>>>>>> 
>>>>>> guard let element = base.next()
>>>>>> else { return nil }
>>>>>> 
>>>>>> With collections and indices it would look something like this:
>>>>>> 
>>>>>> base.formIndex(after: &index)
>>>>>> guard index != baseEndIndex
>>>>>> else { return endIndex }
>>>>>> 
>>>>>> let element = base[index]
>>>>>> 
>>>>>> That’s not too bad but it is more verbose.  
>>>>> 
>>>>> Sequence today is a single-pass thing.  If you are wrapping Sequence
>>>>> today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
>>>>> to deal with indices.
>>>>> 
>>>>>> If we’re going to push people towards collections and indices we
>>>>>> should try to make common patterns like “update the iteration state
>>>>>> and return the next element if there is one" simpler.  
>>>>> 
>>>>> That's IndexingIterator.
>>>> 
>>>> Cool, I wrote this thinking that was going away.
>>>> 
>>>>> 
>>>>>> This could be accomplished with an extension method along these lines:
>>>>>> 
>>>>>> guard let element = base.formIndex(after: &index,
>>>>>> .returningOptionalElement)
>>>>>>  else { return endIndex }
>>>>>> 
>>>>>> With an implementation something like:
>>>>>> 
>>>>>> enum FormIndexResult {
>>>>>>  .returningOptionalElement
>>>>>> }
>>>>>> extension Collection {
>>>>>>  func formIndex(after i: inout Self.Index, _ result:
>>>>>> FormIndexResult) -> Self.Element?
>>>>>> }
>>>>>> 
>>>>>> This would provide similar functionality to `IndexingIterator` without
>>>>>> coupling the storage of `elements` and `position` (which is important
>>>>>> if you’re wrapping a collection and need to wrap the collection and
>>>>>> its indices independently).
>>>>> 
>>>>> I'm afraid I don't understand.  Could you be more explicit about what
>>>>> you have in mind?
>>>> 
>>>> The idea was to provide functionality similar to `IndexingIterator` in
>>>> the sense the following code would provide equivalent functionality to
>>>> `iterator.next()` but expressed in terms of a collection and an index:
>>>> 
>>>> let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)
>>>> 
>>>> vs
>>>> 
>>>> let optionalElement = myIterator.next()
>>>> 
>>>> The single case enum is just there to provide a label that
>>>> differentiates the overload.
>>>> 
>>>> If we’re keeping IndexingIterator this probably isn’t necessary.  I
>>>> still have a use case for it but it is rather obscure.
>>> 
>>> I can imagine wanting a design like the above for cases where
>>> implementing the endIndex requires adding an extra bit of state, e.g. in
>>> 
>>> struct LazyPrefix<Base: Collection> : Collection {
>>>   init(_ base: Base, where: (C.Element)->Bool)
>>>   ...
>>> }
>>> 
>>> you don't want to traverse the base collection eagerly just to come up
>>> with an endIndex, so you store an optional Base.Index in
>>> LazyPrefix.Index, which is nil for the endIndex.  In these cases, index
>>> comparison is less efficient than it might otherwise be.
>>> 
>>> But my answer for these cases is simple: simply use a specialized
>>> Iterator that will be more efficient than IndexingIterator.  Is there a
>>> reason that doesn't work for your case?
>> 
>> I would index a custom iterator in my case,
> 
> I don't know what it means to index an iterator.

Wow, I have no idea how I feel ended up writing that.  I must have gotten distracted somehow.  I meant a custom iterator wouldn't be relevant in my case.

> 
>> but I am talking about code that would live inside the implementation
>> of `index(after:)` in the `Collection` conformance of a wrapper
>> `Collection` that bears some resemblance to `LazyFlattenCollection`.
>> In this case you are receiving your `Index` which wraps `Base.Index`.
> 
> I don't understand, but maybe it doesn't matter.  

Sorry, maybe I would need to provide more context.  But it's not important as like I said, it is a relatively obscure case.

> FWIW, I'm pretty
> confident we could write a specialized Collection protocol that, given
> some equatable State and a next() method, supplied all the basic
> Collection requirements, for the cases where the Iterator is the most
> efficient means of traversal, e.g.
> 
>      protocol IteratingCollection : Collection {
>        associatedtype IterationState : Equatable
>        func next(inout state: IterationState) -> Element?
>        func startState() -> IterationState
>      }
> 
>      extension IteratingCollection {
>        // all the collection requirements implemented in terms
>        // of startState() and next(...)
>      }

This would be cool.  It could be the basis of a method taking a start state and a closure (I think you mentioned a method like this earlier).

> 
>> Like I said, this is a pretty obscure case and I would never suggest
>> including it on the grounds of a use case that is only relevant to
>> `index(after:)` implementations in wrapper collections. :) I brought
>> it because I thought you may have been suggesting a more drastic
>> change that removed the collection iterators.
> 
>>> 
>>>>> On second thought, I believe it is important to have a way to support
>>>>> existing “partially formed” multipass sequences that don't expose
>>>>> copying or equality for their iteration states.  
>>>> 
>>>> Can you provide examples of these?  I’m having difficulty thinking of
>>>> one.
>>> 
>>> NSSet is an example.
>>> 
>>>>> Iterator is the right way to do that.  So I think we need to keep
>>>>> Iterator around.
>>>> 
>>>> I don’t have any objection to keeping it either. :) Hopefully we’d
>>>> still be able to improve the design in the future if / when new
>>>> enabling language features come along.
>>>> 
>>>>> 
>>>>>> In the meantime, people would be able to implement their own protocol
>>>>>> for single pass sequences.  What they would lose is for..in as well as
>>>>>> the standard library algorithms.  I’m not sure how many people this
>>>>>> would impact or how big the impact would be for them.  We have seen a
>>>>>> couple of examples in this discussion, but probably not enough to
>>>>>> asses the overall impact.
>>>>>> 
>>>>>> One thing you don’t mention here is a distinction between finite and
>>>>>> infinite single-pass sequences (iterators).  I don’t know if the
>>>>>> finite / infinite distinction is as important here, but wanted to
>>>>>> point it out.  Obviously if we remove support single-pass sequences
>>>>>> now we could defer that discussion until we’re ready to bring back
>>>>>> support for them.
>>>>> 
>>>>> There are a few possible answers I can think of:
>>>>> 
>>>>> 1. Do the “obvious” thing and create a separate protocol for finite
>>>>> single-pass sequences
>>>>> 
>>>>> 2. Decide that the combination of infinite and single-pass is rare
>>>>> enough (/dev/urandom, temperature sensor) that it's better to just
>>>>> ask people handling them to be careful and not, e.g., try to “count”
>>>>> them.
>>>>> 
>>>>> 3. Decide that everything on a single-pass sequence is lazy. Since you
>>>>> can only take a single pass anyway, people won't observe their
>>>>> closures being called more often than necessary, which was the main
>>>>> motivator for making map, filter, et. al eager on collections without
>>>>> an explicit .lazy.
>>>>> 
>>>>> Implications of #3:
>>>>> 
>>>>> * Any “partially-formed” multipass sequences (modeling only Iterator)
>>>>> would be free to expose an accurate underestimatedCount, thereby
>>>>> optimizing the process of copying into an array. The lazy filter
>>>>> Iterator adaptor would have an underestimatedCount of 0.
>>>>> 
>>>>> * All algorithms that require multiple passes, such as sorted(), would
>>>>> be unavailable on Iterator.  You'd have to construct an Array (or
>>>>> other MutableCollection) and sort that in-place.  Of course,
>>>>> constructing an Array from an Iterator could still go on forever if
>>>>> the Iterator turned out to be infinite, which means, at some level #3
>>>>> is just a refinement of #2 that makes it less error-prone.
>>>> 
>>>> Do you lean towards any of these?
>>> 
>>> Yes, #3.  
>>> 
>>> We can always make the few operations that have to be eager—such as
>>> Array construction from an Iterator—explicit with a label or something:
>>> 
>>>     Array(claimingFiniteness: someIterator)
>> 
>> This makes sense.  Finite single-pass iterators can always be added in
>> the future if compelling use cases emerge. We’re not taking anything
>> away.
>> 
>> All of the code I have looked at that makes a finite assumption would
>> be converted to require `Collection` in the new model.
> 
> I think you mean FiniteCollection, yes?

Yes, sorry.

> 
> -- 
> Dave