[swift-evolution] Proposal: CollectionType.cycle property for an infinite sequence

Kevin Ballard kevin at sb.org
Wed Dec 30 15:39:50 CST 2015

Refactoring like that can always be done later; introducing infinite
sequences now shouldn't make it any harder to refactor the protocols. If
anything, it will provide more practical experience with how infinite
sequences interact with the current protocol hierarchy that would help
guide the design of any refactoring (or, perhaps more importantly, help
determine if such a refactoring is worthwhile).

A big concern I have with refactoring like that is you'd rarely ever be
able to actually bound an algorithm on FiniteSequenceType, because there
will always be ways of constructing sequences that the compiler can't
tell if it's infinite. Trivial example is `ary.cycle.takeWhile(pred)`.
This is infinite if `pred(elt)` is true for every element of `ary` and
finite otherwise. But there's tons of other ways of creating such
indeterminate sequences. Bounding algorithms (such as `map(..) -> [T]`
or `reduce()`) on finite sequences only would be rather limiting, as
users who know their sequence must be finite but can't prove it to the
compiler would be unable to use methods that are in fact perfectly safe.

What you could do with such a refactoring is optimize certain algorithms
if they use a more precise sequence type, but I'm not actually sure what
sort of optimizations you can make with your proposed protocol
hierarchy. In fact, the only real optimizations that come to mind are
optimizing when you know you're working with a CycleSequence, because
for various algorithms you really only have to iterate the underlying
elements once (e.g. assuming a pure predicate, CycleSequence can
implement contains() safely, although we don't actually have pure in the
language right now so that's not necessarily a safe assumption; also,
contains() isn't a protocol method, just an extension method, so we
can't actually do that anyway).

On that note, core language team, anyone know offhand why SequenceType
has a bunch of extension methods that aren't part of the protocol? For
example, contains(_ predicate:). The only real reason I can think of is
to shrink the protocol witness table, but surely that's not particularly
meaningful. I warrant that contains(_ predicate:) doesn't really have
any reason to be overridden by anything except sequences that knowingly
repeat elements (e.g. CycleSequence and Repeat), and even that's only if
you assume the predicate is pure, but there's some other methods that
make sense to override on some sequences (like minElement(_
isOrderedBefore:) for any sequence that has a defined ordering).

-Kevin Ballard

On Wed, Dec 30, 2015, at 11:01 AM, plx via swift-evolution wrote:
>> On Dec 29, 2015, at 6:38 PM, Dave Abrahams
>> <dabrahams at apple.com> wrote:
>>> On Dec 29, 2015, at 3:39 PM, plx via swift-evolution <swift-
>>> evolution at swift.org> wrote:
>>>> On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift-
>>>> evolution at swift.org> wrote:
>>>> On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:
>>>>> Personally I’d say this should be a -1 for standard-library
>>>>> inclusion.
>>>>> Swift’s not really built to handle infinite sequences right now;
>>>>> until they are handed better by the standard library convenience
>>>>> methods for creating them shouldn’t be in the standard library.
>>>> As far as I can tell, the only way in which it's "not really built"
>>>> to handle this is that there are multiple constructs that attempt
>>>> to eagerly consume an entire sequence, and using these with an
>>>> infinite sequence would end up looping forever. But I don't
>>>> consider that to be a particularly serious problem. You could "fix"
>>>> it by refactoring SequenceType into two protocols, SequenceType
>>>> (for possibly-infinite sequences) and FiniteSequenceType (for known-
>>>> finite sequences) and then going over the entire standard library
>>>> and updating various spots to use FiniteSequenceType, except this
>>>> would be very limiting (sequences that are not known if they're
>>>> infinite to the compiler could still be valid for various eager
>>>> algorithms if the programmer knows it will be finite in practice).
>>> Indeed, on my wishlist I would like to see the standard protocols
>>> refactored to something more like this:
>>> SequenceType // can be iterated FiniteSequenceType : SequenceType,
>>> // of finite length StableSequenceType : SequenceType, // can be re-
>>> iterated identically CollectionType : StableSequenceType,
>>> FiniteSequenceType, (etc.) // otherwise as it is now
>> This is interesting. A few concerns:
>> First, we have tried not to create any distinct protocols with
>> identical syntactic requirements, because we think it makes the world
>> clearer; we think people are more likely to assign incorrect
>> protocols when all the operations they want are available, but don’t
>> have the right semantics.  That isn’t to say we shouldn’t start doing
>> it, but it would be a break from the past.
>> Higher protocol granularity has a high comprehensibility cost.
>> Distinguishing protocols based on semantic requirements alone may
>> make the library harder to understand.  I’ve heard some people’s
>> heads have exploded from simply encountering CollectionType.
>> Next, it’s a principle of generic programming that every protocol
>> should be justified by both algorithms that exploit its requirements
>> (e.g. extension methods) and some real-world models that can’t
>> reasonably also conform to a more-refined protocol.  For example, we
>> have a ForwardIndexType because a singly-linked list has multipass
>> forward traversal and can’t do bidirectional traversal.  In order to
>> evaluate any proposal for new protocols, we’d need to see all of
>> these things.
> Speaking frankly, I’d say that there are few benefits to a refactoring
> along the lines sketched above until/unless you want to have solid
> support for things like a product-sequence or product-collection; you
> would gain *significant* advantages from such a refactored hierarchy
> in such scenarios, but I can’t think of any meaningful improvement the
> refactoring would offer in the essentially-linear case. (Note that
> this is distinct from examples of concrete models that are naturally
> e.g. finite-but-not-stable and vice-versa; they exist, but the rest is
> already long enough as it is).
> A full example is beyond my time-budget here, but I can give the
> flavor of where it makes a difference somewhat quickly.
> Consider a hypothetical
> `ProductSequence2<A:SequenceType,B:SequenceType>` that enumerates the
> cartesian product of two sequences (in an unspecified order); the
> naive implementation has problems with non-stable sequences and with
> infinite-sequences, but I’ll only discuss the infinite case b/c it’s
> more relevant here.
> When either—or both—of the sequences are infinite, the order-of-
> iteration has actual observable consequences; as a concrete example,
> if you want this to work out:
> ProductSequence2([1,2].cycle,[1,2].cycle).contains((2,2)) == true
> …then we have this:
> - if `a` is finite and `b` is not, you want to iterate like (a0,b0),
>   (a1,b0), (a2, b0) … , (a0, b1), (a1, b1), …, etc
> - if `b` is finite and `a` is not, you want to iterate like (a0,b0),
>   (a0,b1), (a0, b2) … , (a1, b0), (a1, b1), …, etc
> - if neither `a` nor `b` is finite, you want to iterate like (a0, b0),
>   (a1,b0), (a0,b1), (a2, b0), (a1, b1), (a0, b2), … etc
> …as with those orderings you *will* eventually reach each pair in the
> product (which seems strictly superior to an iteration order which
> will leave many members of the product forever un-visited).
> Importantly, the third ordering is inefficient in the general case,
> and thus isn’t a suitable default; you’d only want it in places where
> you’re *intentionally* using infinite series.
> This is a concrete example of the sort of thing that leaves me
> preferring that the standard library *not* contain infinite sequences
> at this time: such sequences *highly* benefit from special handling,
> but there’s no good way at this time to generically provide such
> handling in a general context within the bounds of the current
> language and standard library .
> If there’s a realistic chance of such a refactoring being accepted if
> sufficiently-well-proposed I can prepare something, but I’m admittedly
> skeptical given the likely magnitude of the associated changes. In
> particular, the API on types like `ForwardIndexType` is rather
> unfortunate for things like a hypothetical product-collection
> scenario; since such products are among the stronger motivating uses,
> this means it’s a protocol refactoring + a lot of API changes to the
> existing standard library, and probably making things clunkier in the
> common / linear cases.
>>> …but can understand the wish to not overly-complicate the basic
>>> protocol hierarchy (and also to avoid baking-in nice-to-have, but
>>> impossible-to-really-enforce semantic requirements; I’d trust the
>>> standard library to use them properly, but not typical 3rd party
>>> code, somewhat defeating the purpose).
>> Well, I guess I should have read ahead and most of my lecture above
>> was needless!  I’m posting it anyway because I think it spells out
>> some important principles we’ll need to refer to later.
>>> Everything else is a difference of outlook; we agree on the facts
>>> and differ in interpretation.
>>> I consider concrete types that adopt a protocol only to simply call
>>> `fatalError` for most of the protocol methods to be pathological —
>>> often useful, but still pathological — and thus far the standard
>>> library doesn’t contain any such pathological types (to my
>>> knowledge).
>>> `cycle` is useful but not useful enough to be the standard library’s
>>> first “pathological” type, so it’s still a -1 as proposed.
>>> This is nothing specific to `cycle` and my opinion here could change
>>> were the language or the standard library to evolve in various ways.
>>>>> You’d also want to call `fatalError` for at least `reduce`,
>>>>> `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`,
>>>>> and `forEach`.
>>>> You can only do it for the ones defined in the protocol, not ones
>>>> defined in extensions. This means map, filter, forEach, and suffix.
>>>> split is actually perfectly implementable for a CycleSequence,
>>>> although it does need a custom implementation. split is bounded by
>>>> at most Int.max splits, which means it is guaranteed to terminate
>>>> even for an infinite sequence (although the final subsequence does
>>>> need to be infinite[1]). Even if there are no separators in the
>>>> cycle, it can just return the cycle itself.
>>>> [1] Interestingly, the current implementation actually dumps the
>>>>     remainder into an Array and returns that (wrapped in
>>>>     AnySequence), which is curious because it would be perfectly
>>>>     legal for it to just wrap the generator up in AnySequence and
>>>>     return that instead. I'm tempted to submit a PR to change that
>>>>     now, as it just seems like unnecessary work to use an array.
>>>>> `startsWith` and `elementsEqual` and `lexicographicComparison` are
>>>>> all broken if you call them like e.g. `self.startsWith(self)`.
>>>> That's true, but what do you really expect when you're calling them
>>>> with two infinite sequences? It's not so much that they're broken
>>>> as it is that you're creating an infinite loop without any way to
>>>> break out. And FWIW, lexicographicalCompare is actually something
>>>> you might want to explicitly support on infinite sequences if you
>>>> know your sequences aren't equal and want to find out which one is
>>>> less than the other.
>>>> There are plenty of ways to shoot yourself in the foot. I don't
>>>> think infinite sequences are really the big problem you're making
>>>> them out to be.
>>>>> You can conceivably implement a non-crashing `contains`,
>>>>> `minElement` and `maxElement` on `CycleSequence` by calling
>>>>> through to the wrapped collection, but that’ll seemingly evaporate
>>>>> as soon as your `CycleSequence` winds up hidden inside an
>>>>> `AnySequence`.
>>>> You can't override those anyway in a generic context, because
>>>> they're not members of the protocol, they're just extensions. You
>>>> could implement them such that your implementation is called when
>>>> working on the concrete CycleSequence type, but I'm not sure if
>>>> that's a great idea to do that when the actual behavior differs
>>>> from calling it generically on SequenceType (well, triggering a
>>>> fatalError() instead of an infinite loop is fine because they're
>>>> both Bottom, but returning a valid result in one context and
>>>> looping infinitely in the other seems bad).
>>>> Of course, these methods could actually be moved into the protocol
>>>> itself, which would let us override them. I'm not entirely sure why
>>>> they aren't in the protocol to begin with, I guess because there's
>>>> not much need for overriding these outside of infinite sequences
>>>> (well, finite sorted sequences could provide an optimized
>>>> min/maxElement, and an optimized version of contains(_:
>>>> Self.Generator.Element), but maybe there's tradeoffs to doing this,
>>>> e.g. maybe there's some reason why having a large protocol witness
>>>> table is a bad idea).
>>> I don’t think `contains` or `minElement/maxElement` *can* be part of
>>> the protocol (in the sense of overridable) at this time (they
>>> require `Element` satisfy certain type constraints), but they
>>> certainly should be if the type system someday would support that.
>>>>> Which illustrates why this is a -1 for me; there's nothing wrong
>>>>> with the functionality in isolation and there’s nothing wrong with
>>>>> infinite sequences, but the standard library should play well with
>>>>> itself, and this wouldn’t play well with the rest of the standard
>>>>> library.
>>>> Ultimately, there's not much difference between an infinite
>>>> sequence and a sequence of Int.max elements. The latter is finite,
>>>> but it's so massive (especially on 64-bit) that any kind of eager
>>>> processing is going to hit the same problems as an infinite
>>>> sequence. Every problem you describe will be a problem with the
>>>> simple sequence `(0..<Int.max)` as well.
>>>> -Kevin Ballard
>>>>> That opinion could change as the language changes or the standard
>>>>> library evolves.
>>>>>> On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution
>>>>>> <swift-evolution at swift.org> wrote:
>>>>>> ## Introduction
>>>>>> Add a new property `cycle` to CollectionType that returns an
>>>>>> infinite SequenceType that yields the elements of the collection
>>>>>> in a loop.
>>>>>> ## Motivation
>>>>>> It's sometimes useful to be able to have an infinite sequence.
>>>>>> For example, `CollectionOfOne(x).cycle` could be used to have an
>>>>>> infinite sequence of a single element (similar to Repeat but
>>>>>> without a count). A common use for infinite sequences is zipping
>>>>>> with a finite sequence. As far as I'm aware, the stdlib does not
>>>>>> currently provide any way to create such an infinite sequence.
>>>>>> ## Proposed solution
>>>>>> Extend CollectionType with a new property `cycle` that yields a
>>>>>> type that conforms to SequenceType. This sequence yields each
>>>>>> element of the collection in an infinite loop.
>>>>>> ## Detailed design
>>>>>> 2 new types would be added:
>>>>>> struct CycleSequence<Base : CollectionType> : LazySequenceType {
>>>>>> ... } struct CycleGenerator<Base : CollectionType> :
>>>>>> GeneratorType { ... }
>>>>>> CollectionType would be extended with a property:
>>>>>> extension CollectionType {  public var cycle: CycleSequence<Self>
>>>>>> { get } }
>>>>>> This is an extension of CollectionType instead of SequenceType
>>>>>> because it requires a multi-pass sequence (and SequenceType does
>>>>>> not provide that guarantee). The returned type conforms to
>>>>>> SequenceType instead of CollectionType because there is no
>>>>>> possible `endIndex` that satisfies the requirement of being
>>>>>> reachable from `startIndex` by zero or more applications of
>>>>>> `successor()`.
>>>>>> Because the default eager versions of map and filter will execute
>>>>>> forever on an infinite sequence, CycleSequence conforms to
>>>>>> LazySequenceType instead of SequenceType in order to provide lazy
>>>>>> versions of those functions. Additionally, it will provide
>>>>>> implementations of the eager versions that simply trigger a
>>>>>> fatalError(), as the alternative is an infinite loop that
>>>>>> consumes more and more memory.
>>>>>> ## Impact on existing code
>>>>>> None
>>>>>> -Kevin Ballard
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list swift-evolution at swift.org
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> _______________________________________________
>>>>> swift-evolution mailing list swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> _______________________________________________
>>>> swift-evolution mailing list swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>  _______________________________________________
>>> swift-evolution mailing list swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> _________________________________________________
> swift-evolution mailing list swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20151230/781c8465/attachment.html>

More information about the swift-evolution mailing list