[swift-evolution] [Pitch] Remove destructive consumption from Sequence

Matthew Johnson matthew at anandabits.com
Fri Jul 1 17:22:19 CDT 2016

> On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams at apple.com> wrote:
> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:
>>> On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams at apple.com>
>> wrote:
>>> on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:
>>>>> On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution at swift.org> wrote:
>>>>> Swift is a language that embraces value semantics.  Many common
>>>>> iterators *can* be implemented with value semantics.  Just because we
>>>>> can’t implement *all* iterators with value semantics doesn’t mean we
>>>>> should require them to have reference semantics.  It just means you
>>>>> can’t *assume* value semantics when working with iterators in generic
>>>>> code unless / until we have a way to specify a value semantics
>>>>> constraint.  That’s not necessarily a bad thing especially when it
>>>>> leaves the door open to interesting future possibilities.
>>>>> -Matthew
>>>> I'm kind of undecided about this personally. I think one of the
>>>> problems with Swift is that the only indication that you have a
>>>> reference type is that you can declare it as a constant, yet still
>>>> call mutating methods upon it, this isn't a very positive way of
>>>> identifying it however. This may be more of a GUI/IDE issue though, in
>>>> that something being a class isn't always that obvious at a glance.
>>>> I wonder, could we somehow force iterators stored in variables to be
>>>> passed via inout? This would make it pretty clear that you're using
>>>> the same iterator and not a copy in all cases, encouraging you to
>>>> obtain another if you really do need to perform multiple passes.
>>> I'm going to push single-pass iteration on the stack briefly and talk
>>> about the topic that's been under discussion here: infinite multipass
>>> sequences.
>>> ## Fitting “Infinite Multipass” Into the Model
>>> It remains to be decided whether it's worth doing, but if it's to
>>> happen
>> I definitely think it’s worth doing.  
> Opinions are nice, but rationales are better.  How will we understand
> *why* it's worth doing?

I agree.  

The rationale has been discussed quite a bit already in this thread.  The current protocols do not provide the semantics many people are assuming in their code, leading to a lot of code that is incorrect despite the fact that it usually works in practice.  

This is especially frequent in the case of the finite assumption.  This assumption is so common it seems very wise to me to encode it as a semantic requirement in a protocol.

IMO these are problem worth addressing, especially now that we have a good handle on what a solution would look like.  

>> I really appreciate the attention that the library team has given to
>> this.
>>> , the standard library team thinks the right design is roughly
>>> this:
>>> /// A multipass sequence that may be infinite
>>> protocol Collection {
>>>   // Only eager algorithms that can terminate available here
>>>   func index(where predicate: (Element)->Bool) -> Index
>>>   // all lazy algorithms available here
>>>   var lazy: ...
>>>   var startIndex: Index
>>>   var endIndex: Index // possibly not reachable from startIndex
>>>   associatedtype SubSequence : Collection
>>>   // do we need an associated FiniteSubsequence, e.g. for prefixes?
>>> }
>>> protocol FiniteCollection : Collection {
>>>   // All eager algorithms available here
>>>   func map(...) ->
>>>   var count: ...
>>> }
>>> protocol BidirectionalCollection : Collection { ... }
>>> protocol RandomAccessCollection : BidirectionalCollection { … }
>> Does this design entirely break with the relationship between
>> collections and iterators (dropping `makeIterator` as a protocol
>> requirement)?  If so, would for..in (over collections) be built on top
>> of indices and use `formIndex(after:)`?  Would it require a finite
>> collection (unless we add `until` to the language and then allow
>> `for..in..until` to work with infinite collections)?
> All of these points are up for discussion.  

Cool.  I think the collection for..in has some nice advantages, but since it sounds like we’ll probably keep Iterator around it might be best to take the approach of making them both work.

You already know that I would prefer to see the current for..in built on finite sequences and allow for..in..unitl to be used with infinite sequences if we add that in the future.  :)

> John McCall pointed out to
> me that an index-based for..in would make it possible to implement
>  for inout x in y { mutate(&x) }

That would be very nice!

I think it might also increase performance.  I don’t know exactly how for..in is implemented today, but the implementation of IndexingIterator compares position to endIndex.  If for..in is also comparing checking the optional for nil that’s an extra comparison.  We shouldn't need to actually construct the optional in the first place using an index-based for..in.  Maybe optimizations like this already exist?  But even if they do, it seems like they wouldn’t be possible in some cases where the type of the sequence isn’t statically known.

>> Would we still retain `IndexingIterator`even if we break the
>> relationship in the protocol requirements?
> Yes: it should be possible to implement Collection algorithms in terms
> of Iterator algorithms, and IndexingIterator provides the means.  That
> said, I think the makeIterator requirement does little harm, especially
> when it can be defaulted for Collections.

I like this answer.

>> Would it still be possible to do things like zip a multi-pass sequence
>> with a single-pass sequence (assuming we keep single-pass sequences or
>> add them back eventually)?  This seems like a use case worth
>> supporting in some way.
> Yes.  If you can create an Iterator from a Collection, and you can zip
> Iterators, you can do this.

Yes, of course.  I’m glad we would keep this relationship in tact.

>> One subtle change I think this implies is that things like
>> `LazyFilterSequence` can implement `makeIterator` with constant
>> complexity, deferring the O(N) complexity to the first call to `next`.
> I don't believe that's a difference, though I could be wrong.

You’re right, I was wrong.  `LazyFilterSequence` just constructs an iterator and returns it.  `LazyFilterCollection` has to loop until it finds the first item matching the predicate in its `startIndex` implementation.  The part I was missing is that `IndexingIterator` gets the `startIndex` in its initializer.

>> `startIndex` for `LazyFilterCollection` currently has O(N) complexity.
>> The complexity of a complete iteration doesn’t change and probably
>> isn’t a big deal, but it’s worth noting.
> Filtered collection views always require a bit of hand-waving around
> performance guarantees; I don't think that changes.
>> I’ve been looking at some code that wraps a sequence and considering
>> how it would be impacted.  With iterators it looks like this:
>> guard let element = base.next()
>>  else { return nil }
>> With collections and indices it would look something like this:
>> base.formIndex(after: &index)
>> guard index != baseEndIndex
>>   else { return endIndex }
>> let element = base[index]
>> That’s not too bad but it is more verbose.  
> Sequence today is a single-pass thing.  If you are wrapping Sequence
> today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
> to deal with indices.
>> If we’re going to push people towards collections and indices we
>> should try to make common patterns like “update the iteration state
>> and return the next element if there is one" simpler.  
> That's IndexingIterator.

Cool, I wrote this thinking that was going away.

>> This could be accomplished with an extension method along these lines:
>> guard let element = base.formIndex(after: &index,
>> .returningOptionalElement)
>>    else { return endIndex }
>> With an implementation something like:
>> enum FormIndexResult {
>>    .returningOptionalElement
>> }
>> extension Collection {
>>    func formIndex(after i: inout Self.Index, _ result:
>> FormIndexResult) -> Self.Element?
>> }
>> This would provide similar functionality to `IndexingIterator` without
>> coupling the storage of `elements` and `position` (which is important
>> if you’re wrapping a collection and need to wrap the collection and
>> its indices independently).
> I'm afraid I don't understand.  Could you be more explicit about what
> you have in mind?

The idea was to provide functionality similar to `IndexingIterator` in the sense the following code would provide equivalent functionality to `iterator.next()` but expressed in terms of a collection and an index:

let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)


let optionalElement = myIterator.next()

The single case enum is just there to provide a label that differentiates the overload.

If we’re keeping IndexingIterator this probably isn’t necessary.  I still have a use case for it but it is rather obscure.  IndexingIterator probably covers the vast majority of use cases.

>>> Q: Why should there be indices on an infinite multipass sequence?  
>>> A: Because the operations on indices apply equally well whether the
>>>  sequence is finite or not.  Find the index of a value in the
>>>  sequence, slice the sequence, find again, etc.
>>> Q: Why is there an endIndex on an infinite seque?
>>> A: So you can write algorithms such as index(where:) once.
>>> Q: Why not allow endIndex to have a different type from startIndex?
>>> A: It appears to offer insufficient benefit for the associated
>>>   complexity in typical usage.  A classic use case that argues for a
>>>   different endIndex type is the null-terminated C string.  But you
>>>   can't index one of those safely without actually counting the
>>>   length,
>>>   and once you've done that you can make the endIndex an Int.
>> It’s also worth nothing that we can use `Optional` with `nil` as the
>> `endIndex` sentinel if necessary.
> True, that's a useful technique when there's no underlying storage in
> the collection (e.g. a fibonacci sequence)
>>> ## Single Pass Iteration
>>> The refinement relationship between Sequence and Collection is
>>> problematic, because it means either:
>>> a) algorithms such as map on single-pass sequences claim to be
>>>  nonmutating even though it's a lie (status quo)
>>> b) those algorithms can't be used on immutable (“let bound”)
>>>  multipass sequences. IMO that would be totally unacceptable.
>>> If we drop the refinement, we can have a saner world.  We also don't
>>> need to separate Sequence and Iterator anymore.  We can simply drop
>>> Sequence altogether, and the protocol for single-pass iteration
>>> becomes Iterator.
>> Makes sense to me.
>>> ### Mutation and Reference Semantics
>>> Everything in Swift is copiable via `let copy = thing` (let's please
>>> not argue over the definition of copy for classes; this is the one
>>> built into the lowest level of the language—I refer to the other one,
>>> that requires allocation, as “clone”).
>>> Anything you do with a sequence that's truly single-pass mutates the
>>> sequence *and of its copies*.  Therefore, such a type *fundamentally*
>>> has reference semantics. One day we may be able to model single-pass
>>> sequences with “move-only” value types, which cannot be copied. You
>>> can find move-only types in languages like Rust and C++, but they are
>>> not supported by Swift today.  So it seems reasonable that all
>>> Iterators in Swift today should be modeled as classes.
>> I think this makes a lot of sense in the model you are proposing.  All
>> multipass structures are collections.  Any sequence that can only
>> support a single pass is modeled as an iterator which inherently has
>> identity.  Making this distinction strong will prevent any confusion.
>>> The fact that Swift doesn't have a mutation model for classes,
>>> though, means that mutating methods on a class constrained protocol
>>> can't be labeled as such.  So consuming operations on a
>>> class-constrained Iterator protocol would not be labeled as mutating.
>>> The standard library team is currently trying to evaluate the
>>> tradeoffs in this area.  One possibility under consideration is
>>> simply dropping support for single-pass sequences until Swift can
>>> support move-only value types and/or gets a mutation model for class
>>> instances.  It would be very interesting to know about any real-world
>>> models of single-pass sequences that people are using in Swift, since
>>> we don't supply any in the standard library.
>> I’m happy to see you mention a mutation model for class instances!  (I
>> don’t mean to sidetrack the discussion, but would love to see that
>> someday)
>> I don’t have any objection to dropping support for single-pass
>> sequences temporarily.  It’s possible that I would feel differently if
>> I was making use of them in my own code but I’m not.
> On second thought, I believe it is important to have a way to support
> existing “partially formed” multipass sequences that don't expose
> copying or equality for their iteration states.  

Can you provide examples of these?  I’m having difficulty thinking of one.

> Iterator is the right way to do that.  So I think we need to keep Iterator around.

I don’t have any objection to keeping it either. :)  Hopefully we’d still be able to improve the design in the future if / when new enabling language features come along.

>> In the meantime, people would be able to implement their own protocol
>> for single pass sequences.  What they would lose is for..in as well as
>> the standard library algorithms.  I’m not sure how many people this
>> would impact or how big the impact would be for them.  We have seen a
>> couple of examples in this discussion, but probably not enough to
>> asses the overall impact.
>> One thing you don’t mention here is a distinction between finite and
>> infinite single-pass sequences (iterators).  I don’t know if the
>> finite / infinite distinction is as important here, but wanted to
>> point it out.  Obviously if we remove support single-pass sequences
>> now we could defer that discussion until we’re ready to bring back
>> support for them.
> There are a few possible answers I can think of:
> 1. Do the “obvious” thing and create a separate protocol for finite
>   single-pass sequences
> 2. Decide that the combination of infinite and single-pass is rare
>   enough (/dev/urandom, temperature sensor) that it's better to just
>   ask people handling them to be careful and not, e.g., try to “count”
>   them.
> 3. Decide that everything on a single-pass sequence is lazy. Since you
>   can only take a single pass anyway, people won't observe their
>   closures being called more often than necessary, which was the main
>   motivator for making map, filter, et. al eager on collections without
>   an explicit .lazy.
> Implications of #3:
> * Any “partially-formed” multipass sequences (modeling only Iterator)
>  would be free to expose an accurate underestimatedCount, thereby
>  optimizing the process of copying into an array. The lazy filter
>  Iterator adaptor would have an underestimatedCount of 0.
> * All algorithms that require multiple passes, such as sorted(), would
>  be unavailable on Iterator.  You'd have to construct an Array (or
>  other MutableCollection) and sort that in-place.  Of course,
>  constructing an Array from an Iterator could still go on forever if
>  the Iterator turned out to be infinite, which means, at some level #3
>  is just a refinement of #2 that makes it less error-prone.

Do you lean towards any of these?

> -- 
> Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160701/d18da374/attachment.html>

More information about the swift-evolution mailing list