[swift-evolution] [Pitch] Remove destructive consumption from Sequence

Matthew Johnson matthew at anandabits.com
Mon Jun 27 18:32:36 CDT 2016


> On Jun 27, 2016, at 2:41 PM, Dave Abrahams <dabrahams at apple.com> wrote:
> 
> 
> on Mon Jun 27 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:
> 
>>> On Jun 27, 2016, at 11:46 AM, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>> wrote:
>>> 
>>> 
>>> on Mon Jun 27 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/> <http://matthew-at-anandabits.com/ <http://matthew-at-anandabits.com/>>> wrote:
>>> 
>> 
>>>>> On Jun 26, 2016, at 10:56 PM, Jonathan Hull via swift-evolution
>>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>> 
>>>>> Can’t a Sequence be potentially infinite, whereas a collection has a
>>>>> defined count/endIndex?  Other than that, I agree with your
>>>> 
>>>>> statement.
>>>>> 
>>>>> Here is what I see as the appropriate structure:
>>>>> 
>>>>> Iterator: Single destructive pass, potentially infinite, (should be for-in able)
>>>>> Sequence: Guaranteed non-destructive multi-pass (vends Iterators),
>>>>> potentially infinite, (should be subscript-able, gain most of
>>>>> collection, but lose anything that relies on it ending)
>>>>> Collection: Multi-pass, guaranteed finite, (no changes from current
>>>>> form, except extra inits from Iterator/Sequence with end conditions)
>>>>> 
>>>>> Right now we are allowed to have an infinite sequence, but calling
>>>>> dropLast or non-lazy map will cause an infinite loop.  These cases
>>>>> could be made much safer by considering the potentially infinite and
>>>>> finite cases separately…
>>>> 
>>>> I think this is pointing in the right general direction.  It would
>>>> make working with `Sequence` much more straightforward and allow us to
>>>> depend on the multi-pass property that is true in practice of the most
>>>> common models of `Sequence`.
>>>> 
>>>> But I agree that we should give much more careful consideration to
>>>> finite / infinite generally and for..in specifically.
>>>> 
>>>> Now that I have been thinking about the finite / infinite distinction
>>>> more closely I have begun to notice a lot of code that is written
>>>> generically using `Sequence` where a for..in loop is really what is
>>>> required, however the “finite sequence” precondition is not explicitly
>>>> stated.  Interestingly, this is the case with the standard library’s
>>>> eager `map` (but not the case with `dropLast` which explicitly notes
>>>> the precondition).  I have been somewhat surprised to realize how
>>>> common this “bug” is (i.e. not stating a precondition).  I think we
>>>> have gotten away with it thus far because the sequences most people
>>>> use most of the time in practice are finite.  But that doesn’t mean we
>>>> should accept this as good enough - IMO it is way to easy to forget to
>>>> document this precondition (and obviously easier for users to overlook
>>>> than preconditions that are actually encoded in the type system,
>>>> violations of which are caught at compile time).
>>>> 
>>>> The fact that this pattern is so pervasive is what I meant when I said
>>>> for..in “naturally” requires a finite sequence.
>>>> 
>>>> IMO it’s better to encode preconditions in the type system when that
>>>> is practical, and especially when the precondition is shared by a vast
>>>> majority of code written using a particular construct (in this case a
>>>> for..in loop written using the most generic for..in-able protocol).
>>>> 
>>>> I think the safest solution is to take the position that writing an
>>>> infinite loop is relatively uncommon and is a more “advanced”
>>>> technique, and thus should be done explicitly.  Do people really write
>>>> infinite loops often enough that the convenience of using for..in when
>>>> writing infinite loops outweighs the safety benefit of preventing
>>>> accidental infinite loops?  I haven’t seen a compelling argument for
>>>> this.
>>> 
>>> Good questions.  I'd also add: “do infinite sequences come up often
>>> enough that accidentally looping on them forever is a problem?”
>> 
>> That’s a good question as well.  In practice the frequency of infinite
>> sequences likely depends on the domain.
>> 
>> IMO this falls into the same category as “do single pass sequences
>> come up often enough that attempting to iterate over them twice is a
>> problem.  To paraphrase your previous post:
>> 
>>> Today, people see a beautiful, simple protocol (Sequence) to which many
>>> things conform. They don't recognize that there's a semantic restriction
>>> (you can’t assume it is finite!) on it, so they write libraries
>>> of functions that may iterate a Sequence to termination.  They test
>>> their libraries with the most commonly-available Sequences, e.g. Arrays
>>> and Ranges, which happen to be finite.  Their tests pass!  But their
>>> constraints are wrong, their whole model of how to write generic code
>>> over sequences is wrong, and some of their code is wrong.
>>> 
>>> IMO this is a problematic programming model.
>> 
>> I definitely don’t mean to put words in your mouth here, but the
>> logical structure of the argument appears identical to me regardless
>> of which issue it is applied to.  I am only trying to make that point.
> 
> Oh, I fully agree.  We're *in* this discussion because neither
> single-pass nor infinite sequences come up very often.  It raises the
> question of whether the conceptual framework ought to accomodate them at
> all, and if they should be included, how they should be represented.

We’ve considered good practical examples of both.  They exist and people use them.  I think we’re better off having a common vocabulary for them rather than requiring everyone who uses these concepts to invent their own.  

I agree that the question of how best to represent them is crucial.

> 
> The quotation you cited above isn't arguing that single-pass sequences
> are important enough to represent.  It is trying to point out that the
> refinement relationship between single- and multipass sequences has an
> undesirable effect.  I'm certain the same thing could occur for infinite
> and finite sequences.

I’m not sure I agree.  Neither are clearly modeled in the current library so I don’t think we know what the impact would be if they were.  I’m willing to bet that people would choose the correct constraints if they were available (modeled accurately in the standard library, well documented, etc).

What both of these *definitely* have in common is that they are purely semantic requirements that cannot be stated explicitly as requirements using syntax of the language.  You have reminded us many times that protocols should not just be about syntax, but should also emphasize semantics.  We shouldn’t shy away from introducing a protocol that doesn’t add any syntactic requirements if it is necessary to capture important semantic requirements.

> 
> FWIW, Max pointed out to me on Friday that Scala's concept for
> possibly-single-pass sequences is called “TraversibleOnce.”  I think
> that name goes a long way to solving the problem.  I'm not sure how we'd
> apply the same idea to finite/infinite sequences, though.

I’ve was thinking about how to best represent these while mowing the lawn this afternoon and had an interesting thought.  If we add one additional protocol and pin down some of the semantics that have been discussed in this thread we could have a pretty straightforward model that captures the various semantics using a protocol in each of 4 quadrants:

                        Single Pass                               Multi Pass

                                                                           
Potentially                                                           Sequence
Infinite              IteratorProtocol  <-------------------------|   makeIterator()
                                  |                                                |
                                  |                                        Collection
Finite                FiniteIteratorProtocol  <-------------------  makeIterator()


The semantic refinement on `Sequence` is that it must now produce a new iterator every time one is requested (and they must all produce identical sequences of values as long as the sequence was not mutated in between calls to `makeIterator`).

The semantic refinement on `Collection` is that it must now be finite and its `Iterator` must conform to `FiniteIteratorProtocol` rather than `IteratorProtocol`.

`FiniteIteratorProtocol` is the new protocol which introduces the semantic requirement that calls to `next` must “eventually” produce `nil`.  “Eventually” might be a bit fuzzy but should carry the intent that your program won’t hang or be killed by the OS (i.e. diverge) if you iterate until `nil` is reached.

You mentioned in another thread that for..in wouldn’t necessarily have to use a single protocol (or refinement relationship).  Under this model I would propose that we make for..in operate on top of `FiniteIteratorProtocol`, but also have the ability to get the iterator from a `Collection`.  (I’m using still advocating for restricting the for..in sugar to finite iteration but the same idea would work at the “potentially infinite” level using `IteratorProtocol and `Sequence`).  The only change from current state (aside from the finite restriction) is to allow you to directly provide an iterator to the for..in loop rathe than requiring the compiler to get a new one via a call to `makeIterator`.

If we do adopt something like this model, I think a couple of guidelines are clear regardless of whether for..in works exclusively with finite concepts or also admits potentially infinite ones.  All code that requires terminating iteration should be written with `Collection` or `FiniteIteratorProtocol`, not `Sequence` or `IteratorProtocol`.  Following from this, all finite sequences should conform to `Collection`, not just `Sequence` so it can continue to work with all code that requires finite sequences.  One advantage of restricting for..in to the finite concepts is that it encourages people to follow this guideline so that their types will be usable with for..in, finite constraints, etc.

In order to make it easy to follow the second guideline, the library should strive to make it as easy as possible to implement forward only `Collection` if you can vend an iterator and guarantee it is finite.  Ideally it would be no more difficult than it is to implement `Sequence` today.  Further customization of the implementation would be done as an optimization, not just to achieve basic capabilities.

One other topic that came up is whether the single pass concepts should be required to have reference semantics.  If we go in that direction it is clear that in the previous model `IteratorProtocol` would have a class requirement which would be inherited by `FiniteIteratorProtocol`.

However, I am not convinced we should require reference semantics here.  This requirement doesn’t exist today and I haven’t heard anyone say it has directly been a problem.  The problem seems to be the due to the fact that `Sequence` is missing a semantic requirement everyone expects.  I don’t know the rationale that was originally behind that decision and am curious.  Is it because for..in works via `Sequence` and it was desirable to have for..in work with single pass sequences?  The above suggested tweak to how for..in addresses that.

Further, consider an algorithm which has multiple “choices” available and would like to be able to take advantage of backtracking.  This can be achieved trivially with value semantic iterators by simply making a copy at any time during iteration and returning to the copy if a dead end is reached.  You could sort of mimic this with a reference semantic iterator by conforming the iterator itself to `Sequence` or `Collection` and calling `makeIterator` to “clone” it.  Obviously this would only be possible with intimate knowledge of the iterator and would usually not be possible retroactively, whereas it would be trivial to take advantage of iterators that have value semantics.

-Matthew

> 
>>>> If we adopt that position then for..in would need to be built on top
>>>> of a guaranteed finite construct.  This would allow programmers to
>>>> continue writing generic code agains the most generic for..in-able
>>>> construct while eliminating a precondition that is often (usually?)
>>>> unstated and likely unconsidered.
>>>> 
>>>> If we do decide to move forward with infinite for..in loops I think we
>>>> need to establish strong guidance around how to properly write generic
>>>> code with these protocols.  Should such code really be constrained to
>>>> `Collection` rather than `Sequence` (i.e. should a potentially
>>>> infinite `Sequence` have an eager map)?  If this is the guidance,
>>>> should it be paired with guidance that all finite sequences should
>>>> conform to `Collection`?  Or is it sufficient to just educate
>>>> developers about this issue and expect people to document the “finite
>>>> Sequence” precondition when the constraint is `Sequence` rather than
>>>> `Collection`?
>>>> 
>>>> I hope we will give serious consideration to these questions while
>>>> this topic is open for discussion.
>>>> 
>>>> -Matthew
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Jon
>>>>> 
>>>>>> on Wed Jun 22 2016, David Waite <david-AT-alkaline-solutions.com <http://david-at-alkaline-solutions.com/>
>>>>>> <http://david-at-alkaline-solutions.com/ <http://david-at-alkaline-solutions.com/> <http://david-at-alkaline-solutions.com/ <http://david-at-alkaline-solutions.com/>>>> wrote:
>>>>>> 
>>>>>>>> On Jun 22, 2016, at 2:57 PM, Dave Abrahams via swift-evolution
>>>>>>>> <swift-evolution at swift.org <http://swift.org/> <http://swift.org/ <http://swift.org/>>
>>>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> <Ahem> “Iterators,” please.
>>>>>>> 
>>>>>>> That makes me happy - for some reason I thought it was still GeneratorProtocol
>>>>>>> 
>>>>>>>>> destructively, but such Generators would not conform to the needs of
>>>>>>>>> Sequence. As such, the most significant impact would be the inability
>>>>>>>>> to use such Generators in a for..in loop, 
>>>>>>>> 
>>>>>>>> Trying to evaluate this statement, it's clear we're missing lots of
>>>>>>>> detail here:
>>>>>>>> 
>>>>>>>> * Would you remove Sequence?
>>>>>>>> * If so, what Protocol would embody “for...in-able?”
>>>>>>> No, I would just remove the allowance in the documentation and API
>>>>>>> design for a destructive/consuming iteration. Sequence would be the
>>>>>>> interface to getting access to repeatable iteration, without the need
>>>>>>> for meeting the other requirements for Collection.
>>>>>> 
>>>>>> That would be wrong unless there exist substantial examples of a
>>>>>> multipass Sequence that *can't* meet the other requirements of
>>>>>> Collection without loss of efficiency.  And since I can write an adaptor
>>>>>> that turns any multipass sequence into a Collection, I think it's
>>>>>> trivial to prove that no such examples exist.
>>>>>> 
>>>>>> -- 
>>>>>> -Dave
>>>>> 
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> 
>>> 
>>> -- 
>>> -Dave
>> 
> 
> -- 
> Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160627/123a15bc/attachment.html>


More information about the swift-evolution mailing list