[swift-evolution] [Pitch] Remove destructive consumption from Sequence

Wed Jun 22 20:12:08 CDT 2016

> On Jun 22, 2016, at 7:27 PM, Brent Royal-Gordon <brent at architechies.com> wrote:
> 
> I think you're tremendously overcomplicating the design.

This isn’t intended as a proposed design.  It is intended to start a discussion exploring the design space.

> 
>> If we’re going to consider alternative designs it is worth considering the semantic space available.  For the sake of discussion, here is a model that captures the various semantics that exist (the names are just strawmen):
>> 
>>                          Iterable 
>>                          /          \
>>                         /             \
>>                        /               \
>>   FiniteIterable                 MultipassIterable
>>                       \                 /
>>                         \              / 
>>                          \            /
>>                         Sequence
>>                                |
>>                                |
>>                         Collection
>> 
>> `Iterable` corresponds to the current `Sequence` - no semantics beyond iteration are required.  Infinite, single-pass “sequences” may conform.  
> 
> Okay, so let's start by undoing that renaming to make clear which semantics in this hierarchy are actually new and which ones are already part of the standard library.
> 
>                        Sequence 
>                          /          \
>                         /             \
>                        /               \
>   FiniteSequence         MultipassSequence
>                       \                 /
>                         \              / 
>                          \            /
>               FiniteMultipassSequence
>                                |
>                                |
>                         Collection

Yes, I should have done it this way to begin with.

> 
>> `for..in` naturally requires `FiniteIterable`,
> 
> Why? There's nothing conceptually incoherent about looping over an infinite sequence. Even ignoring errors, you can `break` out, you can `exit()` from an interior method, you can abort or fail a precondition, or you can just let it loop forever. Forbidding potentially-infinite for loops `for` loops is an arbitrary limitation.

The argument for doing this is that `for..in` is syntactic sugar and looping over an infinite sequence using it is often going to be a mistake.  I think it’s reasonable to suggest that it be a little be more difficult to write an infinite loop in order to prevent one from doing so accidentally.  The position you and Dave have taken is also reasonable.

> 
>> but does not require the `MultipassIterable`.
>> 
>> There are many interesting infinite `MultipassIterable` types.  These include any dynamically generated sequence, such as a mathematical sequence (even numbers, odd numbers, etc).  This is also what the existing `Sequence` would become if we drop support for destructive sequences and do nothing else (note: it would still be possible to accidentally write a `for..in` loop over an infinite sequence).
> 
> Okay, but all of these can be represented as infinite Collections.
> 
> Let me explain. If a Sequence is truly multipass—that is, you can iterate over it many times and it will always return the same elements—then it must look something like this:
> 
> 	struct MyElement { … }	
> 	
> 	struct MyState {
> 		…
> 		init() { … }
> 		mutating func formNextElement(in: MySequence) { … }
> 		var currentElement(in: MySequence) -> MyElement? { … }
> 	}
> 
> 	struct MySequence: MultipassSequence {
> 		struct Iterator: IteratorProtocol {
> 			var sequence: MySequence
> 			var state: MyState
> 			
> 			init(sequence: MySequence, initialState: MyState) {
> 				self.sequence = sequence
> 				self.state = initialState
> 			}
> 			
> 			mutating func next() -> MyElement? {
> 				defer { someState.formNextElement(in: sequence) }
> 				return someState.currentElement(in: sequence)
> 			}
> 		}
> 		
> 		func formIterator() -> MyIterator {
> 			return MyIterator(sequence: self, initialState: MyState())
> 		}
> 	}
> 
> Now, the pieces may *appear* to be different—the Iterator may not actually need a reference to the Sequence, the state may be stored in many properties instead of just one, the code to iterate and get the current element may be directly in `next()` instead of in methods `next()` calls—but any multipass Sequence and Iterator could be refactored to match this pattern, so all multipass Sequences and Iterators are equivalent to it.
> 
> But you can convert MySequence into a possibly infinite MyCollection like so:
> 
> 	struct MyCollection: PossiblyInfiniteCollection {
> 		enum Index: Comparable {
> 			var offset: Int
> 			var state: MyState?
> 		}
> 				
> 		var startIndex: Index {
> 			return Index(offset: 0, state: MyState())
> 		}
> 		var endIndex: Index {
> 			return Index(offset: Int.max, state: nil)
> 		}
> 		
> 		func formIndex(after index: inout Index) {
> 			index.state!.formNextElement(in: self)
> 			if index.state!.currentElement(in: self) == nil {
> 				index.offset = Int.max
> 			}
> 			else {
> 				index.offset! += 1
> 			}
> 		}
> 		
> 		subscript(index: Index) -> MyElement {
> 			return index.state!.currentElement(in: self)
> 		}
> 	}
> 	
> 	func == (lhs: MyCollection.Index, rhs: MyCollection.Index) -> Bool {
> 		return lhs.offset == rhs.offset
> 	}
> 	
> 	func < (lhs: MyCollection.Index, rhs: MyCollection.Index) -> Bool {
> 		return lhs.offset < rhs.offset
> 	}
> 
> In other words, whatever state (other than the sequence itself) the Iterator needs to keep in its stored properties can instead be stored inside an index, and whatever logic the Iterator needs to implement its `next()` method can instead be split between `formIndex(after:)` and `subscript(_:)`. Any MultipassSequence can be written as a PossiblyInfiniteCollection if you're willing to put in the effort. And this is a boon to its multipass-ness, because you can not only start over again from the beginning, but start over from *any* step in the iteration.
> 
> If any MultipassSequence can be written as a PossiblyInfiniteCollection, then we can eliminate another protocol:
> 
>                        Sequence 
>                          /          \
>                         /             \
>                        /               \
>   FiniteSequence         PossiblyInfiniteCollection
>                       \                 /
>                         \              / 
>                          \            /
>                         Collection
> 
> Now we can move any members which may not terminate early (like `map` or `last`) or are ill-defined when infinite (`count`) into `FiniteSequence` and `Collection`, while leaving any which often do terminate early (like `first` or `index(of:)`) in `Sequence` and `PossiblyInfiniteCollection`.
> 
> Is this necessary? I don't really know. Arguably, we could simply provide this hierarchy:
> 
>                        Sequence 
>                          /          \
>                         /             \
>                        /               \
>      IteratorProtocol      Collection
> 
> Where Iterators must be treated as potentially infinite, Collections are always finite, and you are asked (or, with a "closed protocol" feature, forced) to conform to one or the other, rather than to Sequence directly. Then the non-infinite-safe APIs like `map` can move onto `Collection` and we'll have covered most of the use cases which matter.

Thanks!  This is exactly the kind of discussion I intended to encourage with my post.  

This looks pretty good to me but it doesn’t address the issue that initiated this thread - removing destructive consumption (non-multipass) from Sequence.  How do envision that fitting in?

Can you also elaborate on what you envision `IteratorProtocol` would look like in this picture?  You depict it refining `Sequence`.  That is not the current design in the standard library so it isn’t immediately clear to me.

-Matthew

> 
> -- 
> Brent Royal-Gordon
> Architechies
>