[swift-users] Collection underestimatedCount() does _?

Sat Mar 19 17:33:45 CDT 2016

> 1: that the standard API requires that 
> every Sequence type can be instantiated so as to replicate the contents of another? The whole value of Sequence to the examples is that it should be impossible. 

No, it's the other way around. It should be possible to copy any (finite) Sequence into an Array, or any other type that declares it can be instantiated from a Sequence. The source of a Sequence could be anything, including data from external sources outside the program's control (like your /dev/random or wireless data examples).

> 2: that the standard API requires of every Sequence that it should serve the expectation of the supposedly-uncoupled API of Array that it is free to buffer the entire contents of any Sequence of the same Element? Need it or not? Bad news for memory and performance.

I believe there's a default implementation on Sequence as well (which would return 0), so it functions more as an opt-in mechanism to help Array and similar buffering mechanisms estimate the size of the buffer they should allocate. (Remember that greedy `map` and friends all use Array, so they all use this kind of preallocation logic, too.)

> The answer may be that if the user knows enough to build a Sequence for which laziness is indispensable, she should have better sense than to rely on those two things, required or not.

Yes. Basically, the standard library doesn't model the difference between an infinite (or otherwise unwise-to-handle-non-lazily) sequence and a finite sequence. It expects you to know whether a particular sequence is going to be too large to handle greedily and refrain from doing so.

I consider this a weakness in the standard library and would prefer to see a more specific protocol hierarchy, perhaps along the lines of:

	/// Represents any series of elements which can be iterated over. Something which is 
	/// merely Iterable may not be finite and, in any case, is probably not wise to use in
	/// greedy algorithms. It also may not be possible to iterate over more than once.
	protocol Iterable {
		associatedtype Iterator: InteratorProtocol
		func makeIterator() -> Iterator

		associatedtype Subset: Iterable where Subset.Iterator.Element == Iterator.Element
	}
	extension Iterable {
		// only APIs which are either lazy or operate on the start of the series
	}

	/// Represents any finite series of elements which can be iterated over. Something which 
	/// is merely a Sequence may not be possible to iterate over more than once.
	protocol Sequence: Iterable {
		func underestimatedCount() -> Int
		associatedtype Subset: Sequence where Subset.Iterator.Element == Iterator.Element
	}
	extension Sequence {
		// adds greedy and tail-operating APIs, including `count`
	}

	/// Represents any finite, repeatable series of elements which can be iterated over and 
	/// looked up by index.
	protocol Collection: Sequence {
		associatedtype Index: ForwardIndex
		associatedtype Iterator = IndexingIterator<Self>

		var startIndex: Index { get }
		var endIndex: Index { get }
		subscript(position: Index) -> Iterator.Element { get }

		associatedtype Subset: Collection where Subset.Iterator.Element == Iterator.Element = Slice<Self>
		subscript(bounds: Range<Index>) -> Subset { get }
	}

The `for` loop would take an `Iterable` (since you can exit early from a `for` loop), but most APIs that are currently constrained to `Sequence` would not change to `Iterable`, unless they happened to operate lazily or only on the start of the series. That would keep you from accidentally passing an infinite, or might-as-well-be-infinite, sequence to a function that expected to be able to read the whole thing.

I might write a swift-evolution proposal to this effect someday, but right now the Collection protocols are being reworked and I doubt the relevant people have the bandwidth to work on Sequence too.

-- 
Brent Royal-Gordon
Architechies