[swift-dev] LazyFilterCollection is not a Collection

Wed May 11 16:46:58 CDT 2016

On Mon, May 9, 2016 at 4:16 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> On Mon, May 9, 2016 at 11:46 AM, Rob Napier <robnapier at gmail.com> wrote:
> > It violates the performance requirements.
> > CollectionType.count requires O(1) access if Index is a
> > RandomAccessIndexType.
>
> Hi Rob,
>
> We don't have RandomAccessIndexType anymore.
>
>
I don't understand how this addresses any of the original points.
LazyFilterIndex explicitly notes that it breaks the performance promises of
its protocol:

NOTE
> The performance of advancing a LazyFilterIndex depends on how sparsely
> the filtering predicate is satisfied, and may not offer the usual
> performance given by models of ForwardIndexType.

And also LazyFilterCollection:

NOTE
> The performance of advancing a LazyFilterIndex depends on how sparsely
> the filtering predicate is satisfied, and may not offer the usual
> performance given by models of ForwardIndexType. Be aware, therefore,
> that general operations on LazyFilterCollection instances may not have
> the documented complexity.

I believe there should be a very high performance bar before breaking a
protocol's promises. The fact that filter predicates must be both pure and
very fast is not obvious, and this can have a cascading effect of calling
filters an ever-increasing number of times. For example:

var c1 = 0

var c2 = 0

var c3 = 0

let xs = Array(1...1000).lazy

    .filter { _ in c1 += 1; return true }

    .filter { _ in c2 += 1; return true }

    .filter { _ in c3 += 1; return true }

_ = Array(xs)

print(c1, c2, c3, c1+c2+c3) // *7986 3996 2000 13982*

I don't think the caller would expect that these filters would be called a
total of 14k times (rather than 3k).

This is rough, and I just tossed in Xcode and hit "Test" (so it's very
possible that there's a mistake here due to oversimplifying), but I was
unable to find any combination of count or number of filters (including one
filter and up to 1M element arrays) where the current stdlib is faster than
the sequence version:

extension LazySequenceType {

    @warn_unused_result

    public func seqFilter(

        predicate: (Elements.Generator.Element) -> Bool

        ) -> LazyFilterSequence<Elements> {

        return LazyFilterSequence(

            self.elements, whereElementsSatisfy: predicate)

    }

}

class testperfTests: XCTestCase {

    func testPerformanceSeq() {

        self.measureBlock {

            let xs = Array(1...1000).lazy

                .seqFilter { _ in true }

                .seqFilter { _ in true }

                .seqFilter { _ in true }

            _ = Array(xs)

        }

    }

    func testPerformanceCollection() {

        self.measureBlock {

            let xs = Array(1...1000).lazy

                .filter { _ in true }

                .filter { _ in true }

                .filter { _ in true }

            _ = Array(xs)

        }

    }

}

seqFilter was reliably over an order of magnitude faster than filter.  I
tested this with various filters (to demonstrate actually removing some
elements), on Mac and iOS, with between 1 and 3 filters, and with sizes
between 1000 and 1M. The non-collection was always faster in my tests.

I don't see this as a performance slam dunk that justifies breaking the
performance promises of CollectionType. Is there more to the story that I'm
missing?

-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160511/1ee1f8b1/attachment.html>