[swift-evolution] [Pitch] Remove destructive consumption from Sequence

Wed Jun 22 13:36:25 CDT 2016

Today, a Sequence differs from a Collection in that:

- A sequence can be infinitely or indefinitely sized, or could require an O(n) operation to count the values in the sequence. A collection has a finite number of elements, and the fixed size is exposed as an O(1) or O(n) operation via ‘count’

- A collection is indexable, with those indices being usable for various operations including forming subsets, comparisons, and manual iteration

- A sequence may or may not be destructive, where a destructive sequence consumes elements during traversal, making them unavailable on subsequent traversals. Collection operations are required to be non-destructive

I would like to Pitch removing this third differentiation, the option for destructive sequences. 

My main motivation for proposing this is the potential for developer confusion. As stated during one of the previous threads on the naming of map, flatMap, filter, etc. methods on Sequence, Sequence has a naming requirement not typical of the rest of the Swift standard library in that many methods on Sequence may or may not be destructive. As such, naming methods for any extensions on Sequence is challenging as the names need to not imply immutability.

It would still be possible to have Generators which operate destructively, but such Generators would not conform to the needs of Sequence. As such, the most significant impact would be the inability to use such Generators in a for..in loop, although one could make the case for a lower-level Iterable-style interface for requesting destructive-or-nondestructive generators for the purpose of generic algorithms which do not care whether the data given is consumed as part of them doing their work.  I do not make this case here, as instead I plan to make the case that destructive generators would be a rare beast.

From the Swift project documentation at https://github.com/apple/swift/blob/d95921e5a838d7cc450f6fbc2072bd1a5be95e24/docs/SequencesAndCollections.rst#sequences

"Because this construct is generic, s could be

	• an array
	• a set
	• a linked list
	• a series of UI events
	• a file on disk
	• a stream of incoming network packets
	• an infinite series of random numbers
	• a user-defined data structure
	• etc.”

The disruption of such a change on this list of examples would likely be:

- The series of UI events from a UI framework likely indicates a queue, where iterating over a generator would consume items from the head of that queue.

 However, this is not typically how events are handled in such systems - instead, events are often represented either via an event loop dispatch, registered calls made by a thread pool, or a reactive mechanism. Such a stream of incoming UI events would likely be blocking, as such a signaling method for new events would still be needed at the queue level. When you consider UI events are already usually serialized onto a single thread, using a queue at the application level is an extra complexity over the event queue that is already being used at the runloop level or kernel level.

- A file on disk would likely be iterating as a series of UInt8 values, Characters, String lines, or other pattern matches. If built at a low enough level, such as on top of NSInputStream, this would also represent reading a file from a URL.

In this case there are three example behaviors I’d like to call attention to:
1. The developer wants to operate on the file as a set of data, in which case one would expect a Data value, String, or [String] to be created to represent the scenarios above. 
2. The developer wants to parse a structured format, in which case such iteration behaviors would likely be insufficient
3. The developer wants to iterate on the input data as a stream, without waiting for the data to fully arrive or retaining it in memory. I suspect there is less overlap with this sort of developer and the developer who wants a framework-provided String-per-line iteration.

- Streams of incoming network packets build on the two previous points:
Like UI events, a stream of incoming network packets may be better suited to an event dispatch or reactive mechanism. Like file access, a stream of incoming network packets is likely to require processing beyond what is easily obtainable with a for..in loop.

Likewise, it is possible for data to be segmented across network packets at the application level, making a for..in loop possibly have to leak the contents of previous packets to process a single framed network message. It is also more likely that a network connectivity issue would disrupt the packets, requiring either additional error recovery processes to be defined around such a for..in loop, or an interface similar to reactive observables where stream close and errors can be represented as part of the iterated state

- It is unlikely that a for..in loop would be over a random number sequence. 

However, if you did want to use random number sequences in a for..in loop, a ‘random' number sequence is reproducible if it has the same seed. 

Non-repeating behavior does not require consuming bytes except where the ‘random’ sequence needs to be reproducible as a whole, such as gaming and cryptography applications. New ‘random’ data can be had by simply iterating a new random number generator with a new random seed. 

However, iterating over an external random source like /dev/random would no longer be allowed via a for..in loop, because multiple iterations would yield different data.

-DW

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160622/bd671c55/attachment.html>