[swift-evolution] [Pitch] Remove destructive consumption from Sequence

Fri Jul 1 15:43:57 CDT 2016

on Fri Jul 01 2016, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote:

> On Fri, Jul 1, 2016 at 11:51 AM, Dave Abrahams via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>>
>> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:
>>
>> >> On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams at apple.com>
>> > wrote:
>> >>
>> >>
>> >> on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:
>> >>
>> >
>> >>>> On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <
>> swift-evolution at swift.org> wrote:
>> >>>>
>> >>>> Swift is a language that embraces value semantics.  Many common
>> >>>> iterators *can* be implemented with value semantics.  Just because we
>> >>>> can’t implement *all* iterators with value semantics doesn’t mean we
>> >>>> should require them to have reference semantics.  It just means you
>> >>>> can’t *assume* value semantics when working with iterators in generic
>> >>>> code unless / until we have a way to specify a value semantics
>> >>>> constraint.  That’s not necessarily a bad thing especially when it
>> >>>> leaves the door open to interesting future possibilities.
>> >>>>
>> >>>> -Matthew
>> >>>
>> >>> I'm kind of undecided about this personally. I think one of the
>> >>> problems with Swift is that the only indication that you have a
>> >>> reference type is that you can declare it as a constant, yet still
>> >>> call mutating methods upon it, this isn't a very positive way of
>> >>> identifying it however. This may be more of a GUI/IDE issue though, in
>> >>> that something being a class isn't always that obvious at a glance.
>> >>>
>> >>> I wonder, could we somehow force iterators stored in variables to be
>> >>> passed via inout? This would make it pretty clear that you're using
>> >>> the same iterator and not a copy in all cases, encouraging you to
>> >>> obtain another if you really do need to perform multiple passes.
>> >>
>> >> I'm going to push single-pass iteration on the stack briefly and talk
>> >> about the topic that's been under discussion here: infinite multipass
>> >> sequences.
>> >>
>> >> ## Fitting “Infinite Multipass” Into the Model
>> >>
>> >> It remains to be decided whether it's worth doing, but if it's to
>> >> happen
>> >
>> > I definitely think it’s worth doing.
>>
>> Opinions are nice, but rationales are better.  How will we understand
>> *why* it's worth doing?
>>
>> > I really appreciate the attention that the library team has given to
>> > this.
>> >
>> >> , the standard library team thinks the right design is roughly
>> >> this:
>> >>
>> >>  /// A multipass sequence that may be infinite
>> >>  protocol Collection {
>> >>
>> >>    // Only eager algorithms that can terminate available here
>> >>    func index(where predicate: (Element)->Bool) -> Index
>> >>
>> >>    // all lazy algorithms available here
>> >>    var lazy: ...
>> >>
>> >>    var startIndex: Index
>> >>    var endIndex: Index // possibly not reachable from startIndex
>> >>
>> >>    associatedtype SubSequence : Collection
>> >>    // do we need an associated FiniteSubsequence, e.g. for prefixes?
>> >>  }
>> >>
>> >>  protocol FiniteCollection : Collection {
>> >>
>> >>    // All eager algorithms available here
>> >>    func map(...) ->
>> >>    var count: ...
>> >>  }
>> >>
>> >>  protocol BidirectionalCollection : Collection { ... }
>> >>
>> >>  protocol RandomAccessCollection : BidirectionalCollection { … }
>> >
>> > Does this design entirely break with the relationship between
>> > collections and iterators (dropping `makeIterator` as a protocol
>> > requirement)?  If so, would for..in (over collections) be built on top
>> > of indices and use `formIndex(after:)`?  Would it require a finite
>> > collection (unless we add `until` to the language and then allow
>> > `for..in..until` to work with infinite collections)?
>>
>> All of these points are up for discussion.  John McCall pointed out to
>> me that an index-based for..in would make it possible to implement
>>
>>   for inout x in y { mutate(&x) }
>>
>> > Would we still retain `IndexingIterator`even if we break the
>> > relationship in the protocol requirements?
>>
>> Yes: it should be possible to implement Collection algorithms in terms
>> of Iterator algorithms, and IndexingIterator provides the means.  That
>> said, I think the makeIterator requirement does little harm, especially
>> when it can be defaulted for Collections.
>>
>> > Would it still be possible to do things like zip a multi-pass sequence
>> > with a single-pass sequence (assuming we keep single-pass sequences or
>> > add them back eventually)?  This seems like a use case worth
>> > supporting in some way.
>>
>> Yes.  If you can create an Iterator from a Collection, and you can zip
>> Iterators, you can do this.
>>
>> > One subtle change I think this implies is that things like
>> > `LazyFilterSequence` can implement `makeIterator` with constant
>> > complexity, deferring the O(N) complexity to the first call to `next`.
>>
>> I don't believe that's a difference, though I could be wrong.
>>
>> > `startIndex` for `LazyFilterCollection` currently has O(N) complexity.
>> > The complexity of a complete iteration doesn’t change and probably
>> > isn’t a big deal, but it’s worth noting.
>>
>> Filtered collection views always require a bit of hand-waving around
>> performance guarantees; I don't think that changes.
>>
>> > I’ve been looking at some code that wraps a sequence and considering
>> > how it would be impacted.  With iterators it looks like this:
>> >
>> > guard let element = base.next()
>> >   else { return nil }
>> >
>> > With collections and indices it would look something like this:
>> >
>> > base.formIndex(after: &index)
>> > guard index != baseEndIndex
>> >    else { return endIndex }
>> >
>> > let element = base[index]
>> >
>> > That’s not too bad but it is more verbose.
>>
>> Sequence today is a single-pass thing.  If you are wrapping Sequence
>> today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
>> to deal with indices.
>>
>> > If we’re going to push people towards collections and indices we
>> > should try to make common patterns like “update the iteration state
>> > and return the next element if there is one" simpler.
>>
>> That's IndexingIterator.
>>
>> > This could be accomplished with an extension method along these lines:
>> >
>> > guard let element = base.formIndex(after: &index,
>> > .returningOptionalElement)
>> >     else { return endIndex }
>> >
>> > With an implementation something like:
>> >
>> > enum FormIndexResult {
>> >     .returningOptionalElement
>> > }
>> > extension Collection {
>> >     func formIndex(after i: inout Self.Index, _ result:
>> > FormIndexResult) -> Self.Element?
>> > }
>> >
>> > This would provide similar functionality to `IndexingIterator` without
>> > coupling the storage of `elements` and `position` (which is important
>> > if you’re wrapping a collection and need to wrap the collection and
>> > its indices independently).
>>
>> I'm afraid I don't understand.  Could you be more explicit about what
>> you have in mind?
>>
>> >> Q: Why should there be indices on an infinite multipass sequence?
>> >> A: Because the operations on indices apply equally well whether the
>> >>   sequence is finite or not.  Find the index of a value in the
>> >>   sequence, slice the sequence, find again, etc.
>> >>
>> >> Q: Why is there an endIndex on an infinite seque?
>> >> A: So you can write algorithms such as index(where:) once.
>> >>
>> >> Q: Why not allow endIndex to have a different type from startIndex?
>> >>
>> >> A: It appears to offer insufficient benefit for the associated
>> >>    complexity in typical usage.  A classic use case that argues for a
>> >>    different endIndex type is the null-terminated C string.  But you
>> >>    can't index one of those safely without actually counting the
>> >>    length,
>> >>    and once you've done that you can make the endIndex an Int.
>> >
>> > It’s also worth nothing that we can use `Optional` with `nil` as the
>> > `endIndex` sentinel if necessary.
>>
>> True, that's a useful technique when there's no underlying storage in
>> the collection (e.g. a fibonacci sequence)
>>
>> >>
>> >> ## Single Pass Iteration
>> >>
>> >> The refinement relationship between Sequence and Collection is
>> >> problematic, because it means either:
>> >>
>> >> a) algorithms such as map on single-pass sequences claim to be
>> >>   nonmutating even though it's a lie (status quo)
>> >>
>> >> b) those algorithms can't be used on immutable (“let bound”)
>> >>   multipass sequences. IMO that would be totally unacceptable.
>> >>
>> >> If we drop the refinement, we can have a saner world.  We also don't
>> >> need to separate Sequence and Iterator anymore.  We can simply drop
>> >> Sequence altogether, and the protocol for single-pass iteration
>> >> becomes Iterator.
>> >
>> > Makes sense to me.
>> >
>> >>
>> >> ### Mutation and Reference Semantics
>> >>
>> >> Everything in Swift is copiable via `let copy = thing` (let's please
>> >> not argue over the definition of copy for classes; this is the one
>> >> built into the lowest level of the language—I refer to the other one,
>> >> that requires allocation, as “clone”).
>> >>
>> >> Anything you do with a sequence that's truly single-pass mutates the
>> >> sequence *and of its copies*.  Therefore, such a type *fundamentally*
>> >> has reference semantics. One day we may be able to model single-pass
>> >> sequences with “move-only” value types, which cannot be copied. You
>> >> can find move-only types in languages like Rust and C++, but they are
>> >> not supported by Swift today.  So it seems reasonable that all
>> >> Iterators in Swift today should be modeled as classes.
>> >
>> > I think this makes a lot of sense in the model you are proposing.  All
>> > multipass structures are collections.  Any sequence that can only
>> > support a single pass is modeled as an iterator which inherently has
>> > identity.  Making this distinction strong will prevent any confusion.
>> >
>> >> The fact that Swift doesn't have a mutation model for classes,
>> >> though, means that mutating methods on a class constrained protocol
>> >> can't be labeled as such.  So consuming operations on a
>> >> class-constrained Iterator protocol would not be labeled as mutating.
>> >>
>> >> The standard library team is currently trying to evaluate the
>> >> tradeoffs in this area.  One possibility under consideration is
>> >> simply dropping support for single-pass sequences until Swift can
>> >> support move-only value types and/or gets a mutation model for class
>> >> instances.  It would be very interesting to know about any real-world
>> >> models of single-pass sequences that people are using in Swift, since
>> >> we don't supply any in the standard library.
>> >
>> > I’m happy to see you mention a mutation model for class instances!  (I
>> > don’t mean to sidetrack the discussion, but would love to see that
>> > someday)
>> >
>> > I don’t have any objection to dropping support for single-pass
>> > sequences temporarily.  It’s possible that I would feel differently if
>> > I was making use of them in my own code but I’m not.
>>
>> On second thought, I believe it is important to have a way to support
>> existing “partially formed” multipass sequences that don't expose
>> copying or equality for their iteration states.  Iterator is the right
>> way to do that.  So I think we need to keep Iterator around.
>>
>> > In the meantime, people would be able to implement their own protocol
>> > for single pass sequences.  What they would lose is for..in as well as
>> > the standard library algorithms.  I’m not sure how many people this
>> > would impact or how big the impact would be for them.  We have seen a
>> > couple of examples in this discussion, but probably not enough to
>> > asses the overall impact.
>> >
>> > One thing you don’t mention here is a distinction between finite and
>> > infinite single-pass sequences (iterators).  I don’t know if the
>> > finite / infinite distinction is as important here, but wanted to
>> > point it out.  Obviously if we remove support single-pass sequences
>> > now we could defer that discussion until we’re ready to bring back
>> > support for them.
>>
>> There are a few possible answers I can think of:
>>
>> 1. Do the “obvious” thing and create a separate protocol for finite
>>    single-pass sequences
>>
>> 2. Decide that the combination of infinite and single-pass is rare
>>    enough (/dev/urandom, temperature sensor) that it's better to just
>>    ask people handling them to be careful and not, e.g., try to “count”
>>    them.
>>
>
> Not really feeling sufficiently in my element (excuse the pun) to comment
> on most of this discussion, but here I thought I'd chime in. What's
> interesting about your two examples (/dev/urandom, temperature sensor) is
> that, though single-pass, they should be insensitive to destructive
> consumption, no? By which I mean, if some function returns 5 elements from
> the "sequence", in both scenarios it would be undetectable whether it
> consumes 5, 10 or 15 elements in the process, IIUC. Are there other
> examples of infinite, single-pass sequences where destructive consumption
> would make a difference?

You can construct an example where it's detectable if you know enough
about the sequence.  For example, you could zip together elements from
0..<∞ with /dev/urandom, and you would have a single-pass sequence where
the number of consumed elements was detectable.

I don't understand how the detectability of how many elements were
consumed is relevant to the design, though.
-- 
Dave