[swift-evolution] [Pitch] Remove destructive consumption from Sequence
Dave Abrahams
dabrahams at apple.com
Fri Jul 1 21:55:12 CDT 2016
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:
>> On Jul 1, 2016, at 6:32 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>
>>
>> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:
>>
>
>>>> On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>> wrote:
>>>>
>>>>
>>>> on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com
>>>> <http://matthew-at-anandabits.com/>
>>>> <http://matthew-at-anandabits.com/
>>>> <http://matthew-at-anandabits.com/>>> wrote:
>>>>
>>>
>>>>>> On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me <http://swift-evolution-at-haravikk.me/>> wrote:
>>>>>>
>>>>>
>>>>>>>> On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>>>>>
>>>>>>>> Swift is a language that embraces value semantics. Many common
>>>>>>>> iterators *can* be implemented with value semantics. Just because we
>>>>>>>> can’t implement *all* iterators with value semantics doesn’t mean we
>>>>>>>> should require them to have reference semantics. It just means you
>>>>>>>> can’t *assume* value semantics when working with iterators in generic
>>>>>>>> code unless / until we have a way to specify a value semantics
>>>>>>>> constraint. That’s not necessarily a bad thing especially when it
>>>>>>>> leaves the door open to interesting future possibilities.
>>>>>>>>
>>>>>>>> -Matthew
>>>>>>>
>>>>>>> I'm kind of undecided about this personally. I think one of the
>>>>>>> problems with Swift is that the only indication that you have a
>>>>>>> reference type is that you can declare it as a constant, yet still
>>>>>>> call mutating methods upon it, this isn't a very positive way of
>>>>>>> identifying it however. This may be more of a GUI/IDE issue though, in
>>>>>>> that something being a class isn't always that obvious at a glance.
>>>>>>>
>>>>>>> I wonder, could we somehow force iterators stored in variables to be
>>>>>>> passed via inout? This would make it pretty clear that you're using
>>>>>>> the same iterator and not a copy in all cases, encouraging you to
>>>>>>> obtain another if you really do need to perform multiple passes.
>>>>>>
>>>>>> I'm going to push single-pass iteration on the stack briefly and talk
>>>>>> about the topic that's been under discussion here: infinite multipass
>>>>>> sequences.
>>>>>>
>>>>>> ## Fitting “Infinite Multipass” Into the Model
>>>>>>
>>>>>> It remains to be decided whether it's worth doing, but if it's to
>>>>>> happen
>>>>>
>>>>> I definitely think it’s worth doing.
>>>>
>>>> Opinions are nice, but rationales are better. How will we understand
>>>> *why* it's worth doing?
>>>
>>> I agree.
>>>
>>> The rationale has been discussed quite a bit already in this thread.
>>> The current protocols do not provide the semantics many people are
>>> assuming in their code, leading to a lot of code that is incorrect
>>> despite the fact that it usually works in practice.
>>>
>>> This is especially frequent in the case of the finite assumption.
>>> This assumption is so common it seems very wise to me to encode it as
>>> a semantic requirement in a protocol.
>>>
>>> IMO these are problem worth addressing, especially now that we have a
>>> good handle on what a solution would look like.
>>>
>>>>
>>>>> I really appreciate the attention that the library team has given to
>>>>> this.
>>>>>
>>>>>> , the standard library team thinks the right design is roughly
>>>>>> this:
>>>>>>
>>>>>> /// A multipass sequence that may be infinite
>>>>>> protocol Collection {
>>>>>>
>>>>>> // Only eager algorithms that can terminate available here
>>>>>> func index(where predicate: (Element)->Bool) -> Index
>>>>>>
>>>>>> // all lazy algorithms available here
>>>>>> var lazy: ...
>>>>>>
>>>>>> var startIndex: Index
>>>>>> var endIndex: Index // possibly not reachable from startIndex
>>>>>>
>>>>>> associatedtype SubSequence : Collection
>>>>>> // do we need an associated FiniteSubsequence, e.g. for prefixes?
>>>>>> }
>>>>>>
>>>>>> protocol FiniteCollection : Collection {
>>>>>>
>>>>>> // All eager algorithms available here
>>>>>> func map(...) ->
>>>>>> var count: ...
>>>>>> }
>>>>>>
>>>>>> protocol BidirectionalCollection : Collection { ... }
>>>>>>
>>>>>> protocol RandomAccessCollection : BidirectionalCollection { … }
>>>>>
>>>>> Does this design entirely break with the relationship between
>>>>> collections and iterators (dropping `makeIterator` as a protocol
>>>>> requirement)? If so, would for..in (over collections) be built on top
>>>>> of indices and use `formIndex(after:)`? Would it require a finite
>>>>> collection (unless we add `until` to the language and then allow
>>>>> `for..in..until` to work with infinite collections)?
>>>>
>>>> All of these points are up for discussion.
>>>
>>> Cool. I think the collection for..in has some nice advantages, but
>>> since it sounds like we’ll probably keep Iterator around it might be
>>> best to take the approach of making them both work.
>>>
>>> You already know that I would prefer to see the current for..in built
>>> on finite sequences and allow for..in..unitl to be used with infinite
>>> sequences if we add that in the future. :)
>>>
>>>> John McCall pointed out to
>>>> me that an index-based for..in would make it possible to implement
>>>>
>>>> for inout x in y { mutate(&x) }
>>>
>>> That would be very nice!
>>>
>>> I think it might also increase performance. I don’t know exactly how
>>> for..in is implemented today, but the implementation of
>>> IndexingIterator compares position to endIndex. If for..in is also
>>> comparing checking the optional for nil that’s an extra comparison.
>>> We shouldn't need to actually construct the optional in the first
>>> place using an index-based for..in. Maybe optimizations like this
>>> already exist? But even if they do, it seems like they wouldn’t be
>>> possible in some cases where the type of the sequence isn’t statically
>>> known.
>>>
>>>>
>>>>> Would we still retain `IndexingIterator`even if we break the
>>>>> relationship in the protocol requirements?
>>>>
>>>> Yes: it should be possible to implement Collection algorithms in terms
>>>> of Iterator algorithms, and IndexingIterator provides the means. That
>>>> said, I think the makeIterator requirement does little harm, especially
>>>> when it can be defaulted for Collections.
>>>
>>> I like this answer.
>>>
>>>>
>>>>> Would it still be possible to do things like zip a multi-pass sequence
>>>>> with a single-pass sequence (assuming we keep single-pass sequences or
>>>>> add them back eventually)? This seems like a use case worth
>>>>> supporting in some way.
>>>>
>>>> Yes. If you can create an Iterator from a Collection, and you can zip
>>>> Iterators, you can do this.
>>>
>>> Yes, of course. I’m glad we would keep this relationship in tact.
>>>
>>>>
>>>>> One subtle change I think this implies is that things like
>>>>> `LazyFilterSequence` can implement `makeIterator` with constant
>>>>> complexity, deferring the O(N) complexity to the first call to `next`.
>>>>
>>>> I don't believe that's a difference, though I could be wrong.
>>>
>>> You’re right, I was wrong. `LazyFilterSequence` just constructs an
>>> iterator and returns it. `LazyFilterCollection` has to loop until it
>>> finds the first item matching the predicate in its `startIndex`
>>> implementation. The part I was missing is that `IndexingIterator`
>>> gets the `startIndex` in its initializer.
>>>
>>>>
>>>>> `startIndex` for `LazyFilterCollection` currently has O(N) complexity.
>>>>> The complexity of a complete iteration doesn’t change and probably
>>>>> isn’t a big deal, but it’s worth noting.
>>>>
>>>> Filtered collection views always require a bit of hand-waving around
>>>> performance guarantees; I don't think that changes.
>>>>
>>>>> I’ve been looking at some code that wraps a sequence and considering
>>>>> how it would be impacted. With iterators it looks like this:
>>>>>
>>>>> guard let element = base.next()
>>>>> else { return nil }
>>>>>
>>>>> With collections and indices it would look something like this:
>>>>>
>>>>> base.formIndex(after: &index)
>>>>> guard index != baseEndIndex
>>>>> else { return endIndex }
>>>>>
>>>>> let element = base[index]
>>>>>
>>>>> That’s not too bad but it is more verbose.
>>>>
>>>> Sequence today is a single-pass thing. If you are wrapping Sequence
>>>> today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
>>>> to deal with indices.
>>>>
>>>>> If we’re going to push people towards collections and indices we
>>>>> should try to make common patterns like “update the iteration state
>>>>> and return the next element if there is one" simpler.
>>>>
>>>> That's IndexingIterator.
>>>
>>> Cool, I wrote this thinking that was going away.
>>>
>>>>
>>>>> This could be accomplished with an extension method along these lines:
>>>>>
>>>>> guard let element = base.formIndex(after: &index,
>>>>> .returningOptionalElement)
>>>>> else { return endIndex }
>>>>>
>>>>> With an implementation something like:
>>>>>
>>>>> enum FormIndexResult {
>>>>> .returningOptionalElement
>>>>> }
>>>>> extension Collection {
>>>>> func formIndex(after i: inout Self.Index, _ result:
>>>>> FormIndexResult) -> Self.Element?
>>>>> }
>>>>>
>>>>> This would provide similar functionality to `IndexingIterator` without
>>>>> coupling the storage of `elements` and `position` (which is important
>>>>> if you’re wrapping a collection and need to wrap the collection and
>>>>> its indices independently).
>>>>
>>>> I'm afraid I don't understand. Could you be more explicit about what
>>>> you have in mind?
>>>
>>> The idea was to provide functionality similar to `IndexingIterator` in
>>> the sense the following code would provide equivalent functionality to
>>> `iterator.next()` but expressed in terms of a collection and an index:
>>>
>>> let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)
>>>
>>> vs
>>>
>>> let optionalElement = myIterator.next()
>>>
>>> The single case enum is just there to provide a label that
>>> differentiates the overload.
>>>
>>> If we’re keeping IndexingIterator this probably isn’t necessary. I
>>> still have a use case for it but it is rather obscure.
>>
>> I can imagine wanting a design like the above for cases where
>> implementing the endIndex requires adding an extra bit of state, e.g. in
>>
>> struct LazyPrefix<Base: Collection> : Collection {
>> init(_ base: Base, where: (C.Element)->Bool)
>> ...
>> }
>>
>> you don't want to traverse the base collection eagerly just to come up
>> with an endIndex, so you store an optional Base.Index in
>> LazyPrefix.Index, which is nil for the endIndex. In these cases, index
>> comparison is less efficient than it might otherwise be.
>>
>> But my answer for these cases is simple: simply use a specialized
>> Iterator that will be more efficient than IndexingIterator. Is there a
>> reason that doesn't work for your case?
>
> I would index a custom iterator in my case,
I don't know what it means to index an iterator.
> but I am talking about code that would live inside the implementation
> of `index(after:)` in the `Collection` conformance of a wrapper
> `Collection` that bears some resemblance to `LazyFlattenCollection`.
> In this case you are receiving your `Index` which wraps `Base.Index`.
I don't understand, but maybe it doesn't matter. FWIW, I'm pretty
confident we could write a specialized Collection protocol that, given
some equatable State and a next() method, supplied all the basic
Collection requirements, for the cases where the Iterator is the most
efficient means of traversal, e.g.
protocol IteratingCollection : Collection {
associatedtype IterationState : Equatable
func next(inout state: IterationState) -> Element?
func startState() -> IterationState
}
extension IteratingCollection {
// all the collection requirements implemented in terms
// of startState() and next(...)
}
> Like I said, this is a pretty obscure case and I would never suggest
> including it on the grounds of a use case that is only relevant to
> `index(after:)` implementations in wrapper collections. :) I brought
> it because I thought you may have been suggesting a more drastic
> change that removed the collection iterators.
>>
>>>> On second thought, I believe it is important to have a way to support
>>>> existing “partially formed” multipass sequences that don't expose
>>>> copying or equality for their iteration states.
>>>
>>> Can you provide examples of these? I’m having difficulty thinking of
>>> one.
>>
>> NSSet is an example.
>>
>>>> Iterator is the right way to do that. So I think we need to keep
>>>> Iterator around.
>>>
>>> I don’t have any objection to keeping it either. :) Hopefully we’d
>>> still be able to improve the design in the future if / when new
>>> enabling language features come along.
>>>
>>>>
>>>>> In the meantime, people would be able to implement their own protocol
>>>>> for single pass sequences. What they would lose is for..in as well as
>>>>> the standard library algorithms. I’m not sure how many people this
>>>>> would impact or how big the impact would be for them. We have seen a
>>>>> couple of examples in this discussion, but probably not enough to
>>>>> asses the overall impact.
>>>>>
>>>>> One thing you don’t mention here is a distinction between finite and
>>>>> infinite single-pass sequences (iterators). I don’t know if the
>>>>> finite / infinite distinction is as important here, but wanted to
>>>>> point it out. Obviously if we remove support single-pass sequences
>>>>> now we could defer that discussion until we’re ready to bring back
>>>>> support for them.
>>>>
>>>> There are a few possible answers I can think of:
>>>>
>>>> 1. Do the “obvious” thing and create a separate protocol for finite
>>>> single-pass sequences
>>>>
>>>> 2. Decide that the combination of infinite and single-pass is rare
>>>> enough (/dev/urandom, temperature sensor) that it's better to just
>>>> ask people handling them to be careful and not, e.g., try to “count”
>>>> them.
>>>>
>>>> 3. Decide that everything on a single-pass sequence is lazy. Since you
>>>> can only take a single pass anyway, people won't observe their
>>>> closures being called more often than necessary, which was the main
>>>> motivator for making map, filter, et. al eager on collections without
>>>> an explicit .lazy.
>>>>
>>>> Implications of #3:
>>>>
>>>> * Any “partially-formed” multipass sequences (modeling only Iterator)
>>>> would be free to expose an accurate underestimatedCount, thereby
>>>> optimizing the process of copying into an array. The lazy filter
>>>> Iterator adaptor would have an underestimatedCount of 0.
>>>>
>>>> * All algorithms that require multiple passes, such as sorted(), would
>>>> be unavailable on Iterator. You'd have to construct an Array (or
>>>> other MutableCollection) and sort that in-place. Of course,
>>>> constructing an Array from an Iterator could still go on forever if
>>>> the Iterator turned out to be infinite, which means, at some level #3
>>>> is just a refinement of #2 that makes it less error-prone.
>>>
>>> Do you lean towards any of these?
>>
>> Yes, #3.
>>
>> We can always make the few operations that have to be eager—such as
>> Array construction from an Iterator—explicit with a label or something:
>>
>> Array(claimingFiniteness: someIterator)
>
> This makes sense. Finite single-pass iterators can always be added in
> the future if compelling use cases emerge. We’re not taking anything
> away.
>
> All of the code I have looked at that makes a finite assumption would
> be converted to require `Collection` in the new model.
I think you mean FiniteCollection, yes?
--
Dave
More information about the swift-evolution
mailing list