[swift-evolution] [Proposal] Change guarantee for GeneratorType.next() to always return nil past end

Sun Mar 6 19:58:33 CST 2016

On Sat, Mar 5, 2016, at 09:12 PM, Dave Abrahams via swift-evolution wrote:
> 
> on Sat Mar 05 2016, Kevin Ballard <swift-evolution at swift.org> wrote:
> 
> > On Thu, Mar 3, 2016, at 03:24 AM, Patrick Pijnappel via swift-evolution wrote:
> >> Hmm I see.
> >>
> >> Do we have any example cases where returning nil repeatedly would
> >> require extra branches or state?
> >
> > Yes. My proposed .takeWhile() and .dropWhile() sequence adaptors
> > (https://github.com/apple/swift-evolution/pull/95) would hit this case.
> > Both of those adaptors would need to keep around extra state and in
> > order to keep returning nil.
> 
> Are there realistic cases where this state and check would produce
> measurable overhead?

I don't know, but at a bare minimum this is extra complexity on the implementation side.

> Do you have an implementation somewhere I could look at?

No I don't. I didn't want to spend the time implementing this when I wasn't sure if the proposal would even be accepted.

> > My preferred solution for this case is to add a new Generator adaptor
> > called a FuseGenerator, with a convenience method .fuse(). All this
> > adaptor does is include the extra state in order to ensure it keeps
> > returning nil forever. This way Generators don't have to keep the state
> > for that guarantee, and the majority case where client codes doesn't
> > rely on this guarantee doesn't need the check either, and in the rare
> > case where this guarantee is important all the user has to do is call
> > .fuse() on the generator and use the result of that instead.
> 
> Between the lack of predictability being addressed here and the addition
> of a rarely-used additional component, that's a lot of complexity in the
> library; is there a clear benefit?

The added complexity is just a single extra GeneratorType adaptor and corresponding method. But it reduces complexity elsewhere, as GeneratorType implementations no longer have to worry about their post-nil behavior (unless the type itself wishes to make specific guarantees about that). Many generators won't notice because they'll naturally hit a fixed point when they return nil (e.g. indexing generators won't increment their index at that point), but generator adaptors, or generators that represent some external source of data, won't have to keep around the extra state to track this anymore.

And on the caller side, my claim is that there are very few people who end up writing code that invokes post-nil behavior to begin with (I believe the most common way to hit this case is when writing your own GeneratorType, where you have to decide what your generator's post-nil behavior is). The callers who care about getting nil forever can just use the `.fuse()` adaptor, and leaving this as implementation-defined gives the flexibility of providing other behaviors. Not only is this more flexible for existing generators, but it may allow other generator-like types to adopt the GeneratorType protocol when they couldn't have easily done so otherwise (e.g. my previously-mentioned idea of a FIFO queue that explicitly doesn't want to return nil forever could adopt GeneratorType, but if generators must return nil forever then the FIFO queue can't adopt it, and the author may choose to not go through the effort of writing a separate generator on the off chance that someone wants to use the FIFO queue in a context that needs a generator).

> > All that said, I would be strongly in favor of dropping the language
> > about triggering a precondition failure. I'd prefer to leave it as implementation-
> > defined behavior, which an encouragement to keep returning nil if it's
> > easy to do so. A benefit of this is Generators could opt to explicitly
> > define their post-nil behavior, e.g. TakeWhileGenerator could explicitly
> > document that after it has returned nil, subsequent calls to .next()
> > will continue to consume the underlying generator and return another
> > stream of elements terminating in `nil` (with the caveat that if the
> > underlying generator is exhausted then behavior depends on the
> > underlying generator's post-nil behavior). Granted, this probably isn't
> > useful in most cases, but it could be useful upon occasion as a way to
> > lazily split a sequence without building intermediate data structures
> > (assuming that the underlying generator is fused or defines its post-nil
> > behavior as returning nil forever).
> 
> TakeWhileGenerator could also provide a method that returns the
> TakeWhileGenerator for the next segment.  Since you have to know that
> you've got a TakeWhileGenerator to take advantage of this special
> behavior, a different interface is just as useful and probably results
> in clearer code.  

Clever, but it's still additional complexity. I rather like the fact that this described behavior just naturally occurs with the TakeWhileGenerator when implemented in the simplest way possible.

> > FWIW, Rust uses precisely the solution I've described here (and in fact
> > I'm responsible for its std::iter::Fuse iterator). It defines
> > Iterator::next() such that calling .next() after it has returned None
> > may or may not return more elements (but Iterators are not supposed to
> > assert in this case, they should always return something). And it has
> > the .fuse() convenience method that returns a std::iter::Fuse iterator
> > that provides the always-returns-None guarantee. And in practice, almost
> > nobody ever has to actually use .fuse(), since almost nobody writes
> > algorithms that cares about the behavior after next() returns None (and
> > in the rare case where they do, they're typically using some concrete
> > Iterator that has defined behavior, as opposed to working on arbitrary
> > Iterators).
> 
> This is about balancing performance, flexibility, simplicity of
> specification, and predictability.  Rust has a very C++-like approach to
> zero-overhead abstractions, in the sense that it's willing to introduce
> many small wrinkles to avoid even the theoretical possibility of a
> performance penalty.  Having cut my language/library-design teeth in the
> C++ world, I understand that approach really well, but I have also seen
> it do some damage (some of which I was personally responsible for!) and
> I want to be more conservative with Swift.  In short, I'm still leaning
> towards specifying post-nil behavior, and I'd want to have more evidence of
> the value of leaving it unspecified before doing so.

That's a fair point. But I think the important sentence from my comparison to Rust is "And in practice, almost nobody ever has to actually use .fuse(), …". In my experience, very few people ever write code that invokes post-nil behavior, and so I don't like the fact that everyone has to pay for the cost involved in tracking the extra state needed for post-nil behavior.

-Kevin Ballard