[swift-users] 2 x lazy collection bugs

Dmitri Gribenko gribozavr at gmail.com
Tue Feb 9 01:54:04 CST 2016


On Mon, Feb 8, 2016 at 9:28 PM, Howard Lovatt <howard.lovatt at gmail.com> wrote:
> Thanks for the reply Dmitri - comments inline below:
>
>   -- Howard.
>
> On 9 February 2016 at 16:17, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>>
>> On Mon, Feb 8, 2016 at 8:56 PM, Howard Lovatt via swift-users
>> <swift-users at swift.org> wrote:
>> > Hi,
>> >
>> > I think I have found a couple of bugs in lazy collections; just checking
>> > before filling. The following code is a filter map version of the Sieve
>> > of
>> > Eratosthenes:
>> >
>> > func lazyFilterMapForEachLoop(limit: Int = 9) -> [Int] {
>> >
>> >     var possibles = Array(count: limit + 1, repeatedValue: true)
>> >
>> >     return (2 ... limit).lazy.filter { i in // Has to be lazy and
>> > sequential
>> > so that `possibles` is updated before it is used
>>
>> Before looking at your questions below, I want to note that this is
>> not a supported way of using the lazy subsystem.  The issue is that
>> you seem to be relying on the evaluation order.  It is not guaranteed,
>> even if you can reliably reproduce a certain evaluation order in the
>> current version of the library.
>
>
> Not sure where it says that it won't be sequential - do you have a
> reference?

I'm sorry, it is hard to provide a reference to internal discussions
that we had when we were designing these things, so please take my
word for it :)

If the documentation isn't saying anything about this, then it should
be improved to be more explicit.

The same is actually true even for the non-lazy map() and filter() --
you shouldn't be assuming the evaluation order for eager functions
either.  It is just that in practice it is only efficient to build the
result array in exactly one pass, from start to the end, so the
implementation is unlikely to change.  We also realize that many
people are likely to be accidentally relying on the evaluation order
there, so it would be hard to change it.  But still, it is not a
guarantee.

>> > The filter closure is called twice per trial integer! First without
>> > preceding to the map stage at all, i.e. all values are filtered out!
>> > Second
>> > time through the numbers it does proceed to the map stage as you would
>> > expect.
>>
>> Again, remember that you are using the *lazy* subsystem.  The
>> evaluation order of the closures you are passing to the operations is
>> not guaranteed.  Closures can be evaluated once, more than once, or
>> not at all, and in the order that the library choses.  Closures are
>> required to be pure, so you are not allowed to observe how many times
>> and in which order they were actually evaluated.
>>
> Not sure where it says it can be called more than one - the filter might be
> a very expensive operation like a database lookup!
>
>>
>> The reason why the closure is evaluated twice is because you end up
>> calling an eager map(), which tries to reserve the array capacity
>> suitable for the result.  To do that, map() calls 'count' on the lazy
>> filter collection, which triggers the computation the first time.
>>
> map on a lazy collection results in another lazy collection, so I don't see
> this.

There are two overloads -- one from CollectionType protocol, that
returns an Array, and another one from lazy, that returns a lazy
collection.  Both are visible, the second is preferred.  Since you
used the map() function in a position that requires the Array return
type to match with the function's return type (via the return
statement), the lazy overload can't match, so the type checker
selected the first one.

>> It is an open question though, whether calling the closure twice is
>> profitable here, and what can we do about it.  The idea though is that
>> those reallocations can sometimes be quite expensive when the closure
>> is cheap to execute.  OTOH, when the closure is expensive to evaluate,
>> calling it twice is a loss.
>>
>> > It produces an "Index out of range" error despite the fact that maximum
>> > number processed is 9 which is an allowable index of `possibles`.
>>
>> Are you sure you didn't mean to use ..< instead of ...?
>
>
> No I meant 2 ... limit, the array is of size limit + 1.

Oh, right, now I see the issue.  The issue again caused by the
non-purity of the closure passed to filter().  The first time the
closure always says true, which promises a large filtered collection.
But then, during the second pass, it starts rejecting elements it
previously accepted, and delivers a collection smaller than promised.

> PS The equivalent code in Java works as expected.

It is hard to say what the equivalent Java code would be, since Java
does not have a Swift lazy subsystem.  Java has streams, but it would
be wrong to assume that the two operate according to the same exact
rules.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/


More information about the swift-users mailing list