[swift-evolution] [RFC] New collections model: collections advance indices

Tue Mar 8 11:49:53 CST 2016

on Tue Mar 08 2016, plx <swift-evolution at swift.org> wrote:

>> On Mar 3, 2016, at 3:28 PM, Dmitri Gribenko <gribozavr at gmail.com>
>> wrote:
>> 
>> On Thu, Mar 3, 2016 at 9:56 AM, plx via swift-evolution
>> <swift-evolution at swift.org> wrote:
>>> # General Remarks: Great!
>
>>>  Thanks for sharing this proposal; it's a big change, but will be a big improvement once it's in place, and it's encouraging to see the team is willing to undertake changes of such scale.
>>> 
>>> I'm not sure how much discussion you'll actually manage to scare up b/c the issues this proposal addresses are *not* common, but are nevertheless rather significant when encountered.
>>> 
>>> E.G.: it's nice to be able to create simple chain/product/etc. collection-combinators which can, themselves, be collections, without winding up with indices indices forced to choose between *either* holding a back-reference to retrieve various startIndex/endIndex values as-needed *or* carrying around a complete set of all needed startIndex/endIndex values.
>>
>> Agreed!  We already have that problem in the lazy flatMap collection.
>> I don’t have any negative feedback that isn’t subsumed by the next section.
>>> 
>>> # Concern: Linearity Baked-In
>>> 
>>> Even with this change, I have some concern that the proposed
>>> protocol hierarchy from `Collection` onward feels like it has a
>>> foreseeable lack of generality due to how strongly "linear" the
>>> design wants `Collection` to be.
>>>
>>> Is this the right time to raise such concerns (e.g. in-scope for
>> this discussion)?
>>  We can definitely dive into more details about this.  One thing
>> that I would want to understand is whether this non-linearity
>> concept could be added later without a re-design.
>
> Sorry for the belated reply; I had to think for a bit about this one!
>
> I’m not entirely sure, in part b/c I don’t have a ready-to-go
> alternative design to propose.
>
> What I had in mind was to try and "reserve room” for alternative
> indexing schemes to be added later without undue awkwardness.
>
> However, when I went looking for real-world examples of such
> alternative indexing schemes it seems it’s just not something anyone
> actually does, and there’s probably a reason for that!
>
> I think it’s possible to adjust the protocol hierarchy to “reserve
> room”—remove `Indexable`’s methods from `Collection`, add them back to
> a new `ForwardCollection` between `Collection` and
> `BidirectionalCollection`—but it’d only make sense to do that if you
> expect to have use for that room in the future.

This is something we expressly don't do in generic programming.
Protocols (concepts) are spawned only by the existence of real-world
use-cases, and enough of them to make the generality worthwhile.

>> Could you provide more information about the linearity issues that
>> you have in mind?  Are you thinking of something like Segmented
>> Iterators and Hierarchical Algorithms [1]?
>> 
>> [1] http://lafstern.org/matt/segmented.pdf
>
> That’s a neat paper! It also seems limited, insofar as it only
> demonstrated an improved implementation of `fill` (which is somewhat
> suggestive vis-a-vis the suspicions outlined above).

Yes, it's only a small proof-of-concept.  That said, it can easily be
extended to many other algorithms.  The biggest challenges are those
algorithms where you may have lock-step traversal of structures with
different segment sizes.  I'd probably first try to handle that by
building a zip collection that has the union of all segment boundaries
in its component collections, and then build the algorithm on a simpler
segmented algorithm operating on a single collection.

> What I *was* thinking of was something generalizing the chapter on
> “Bifurcate Coordinates” in Stepanov’s “Elements of Programming” (this
> approach seems easily generalizable, but that doesn’t seem to have
> been taken up anywhere).

Oh, cool!  I confess I never made it to that chapter.

I think the biggest problem I see with that concept (after 5 seconds of
skimming, mind you) is that it doesn't accomodate linear arrays (of more
than two elements) at the leaves.  My impression is that modern
high-performance data structures almost always have linear leaves
because of locality and avoiding allocation overhead.

> My thoughts were that as collections get stronger semantics, there’s
> an increasing disconnect between the semantics of the collection and
> the semantics of a “linear” index. 

I think it's just a question of how interesting the data structure is.
But a linear view of vertices or edges is still useful even for a
generalized graph, e.g.
http://www.boost.org/doc/libs/1_60_0/libs/graph/doc/table_of_contents.html

> Working up to it:
>
> Arrays and array-like collections are truly “linear”, and are
> well-modeled by “linear” indices as-per the proposal; there’s a rich
> collection of generic algorithms that work on such collections and
> their indices, and it’s nice to have them in Swift.
>
> Sets and dictionaries (and similar) are what I guess you could call
> “flat” (or “almost-linear”) collections: iteration aside, many
> index-based generic algorithms don’t work (particular mutating
> algorithms), 

That has nothing at all to do with linearity.

> and even where such algorithms *do work* it tends to be like this:
>
>   let stringSet: Set<String> = [“a”, “b”, “c”, “d” ] let bPrefix =
> stringSet.prefixThrough(stringSet.indexOf(“b”)!)
>
> …where the operation is *well-defined* but doesn’t have much to do
> with the semantics of `Set`.

It's not about semantics, but about data structures.  Consider that you
can have a sorted set with the same basic semantics as Set, but for
which the linear traversal order is much more meaningful.

> Collections with even-stronger semantics (tree-shaped collections,
> order-maintaining collections, geometric collections, etc.) will
> likely have even further semantic disconnections!

I disagree about trees and order-maintaining collections, which tend to
overlap.

> As an example, one thing I’m working on is *basically* a “set" of
> `(Value,CGRect)` pairs (with the rects possibly-overlapping each
> other); it keeps its contents organized so you can traverse its
> contents sorted by any combination of edge (north/south/east/west) and
> order (ascending/descending).

Nifty.  Sort of like a 2D interval tree?

> The way I have been structuring it is that each of those traversals is
> basically like this:
>
>   extension CustomCollection {
>
>     /// Returns a collection providing a specific, read-only, ordered
> “view” of the collection's contents.  var byAscendingNorthernEdges:
> AscendingNorthernEdgeView { get }
>
>   }
>
> …and although I *did* make `CustomCollection` itself into a
> `CollectionType`, I can’t think of any scenario in which I’d ever
> intentionally want to use its `Index` for anything other than
> iteration.

You can keep track of where an element is in your tree, and use indices
to remove or update it.

> Similarly, consider something like the tree-index situation vis-a-vis
> generic algorithms like `binarySearch`, `upperBound`, `lowerBound`,
> and `equalRange`:
>
> - if the tree *isn’t* ordered-as-required the generic algorithms won’t
> work b/c the precondition isn’t being met 

Uninteresting, IMO

> - if the tree *is* ordered-as-required the algorithms *will* work, but
> an implementation using linear-indices would likely be sub-optimal
> vis-a-vis implementations that exploit the tree-structure (start at
> root, follow appropriate branch, etc.)

Yes, that's what the Austern paper is about.

> …all of which leads me to wonder a bit about the general usefulness of
> such linear indices outside of iteration.

Hmm, a rich family of general, useful, and practically efficient
algorithms have been built on these concepts (e.g. the STL,
http://www.boost.org/doc/libs/1_60_0/libs/libraries.htm#Algorithms, ...)

It's possible to gain efficiency with more sophisticated approaches like
those proposed by Austern, but IMO there's plenty of evidence that the
model works outside of iterations.

> To be clear, there’s nothing in all of the above that has anything
> specifically to do with “Collections move Indices”; it’s all about
> potential future directions.
>
> Even then, it seems like most other languages stick with what I’ve
> been calling a “linear” model and their “fancier" collections *are*
> collections (as per their basic collection-API) and then use
> additional API to get at the “fancier” functionality, so it’s not
> clear there’s a real need to do different here.

I'm all for generalizing and optimizing, but there are still lots of
linear collections out there that people need to process, and any model
we eventually come up with has to work for those, *without* undue pain.

-- 
-Dave