[swift-evolution] [Review] SE-0065 A New Model for Collections and Indices

Dave Abrahams dabrahams at apple.com
Mon Apr 18 16:52:00 CDT 2016


on Thu Apr 14 2016, plx <swift-evolution at swift.org> wrote:

>> On Apr 14, 2016, at 12:12 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
>>>> 
>>>> No, you can't, at least not usefully.  An Index that's at the end of one
>>>> collection is in the middle of another, or with a suitably-modified version
>>>> of the same collection.  
>
>>> 
>>> Sure, in certain concrete scenarios it’s possible for one collection’s
>>> indices to have such relationships to some other collection.
>>> 
>>> But, what of it? 
>>> 
>>> In a generic context you can’t assume this; 
>> 
>> That's incorrect.  A slice's indices are *documented* as having a
>> particular relationship to those of the thing it was sliced from.  This
>> applies everywhere.  A dictionary's keys and values use the same indices
>> as the dictionary itself, and have a correspondence.
>
> You’re right, of course; I rarely use slices and completely overlooked
> them.
>
> I also phrased it badly, b/c what I was trying to express is that code
> like the below is (I think?) unlikely to work generically:
>
>   extension Collection where Element:Equatable {
>
>      // plz don’t do this
>      func hasValueMismatch(with other: Self, at index: Index) -> Bool {
>        return self[index] != other[index]
>      }
>
>      // plz don’t do this either
>     func hasValueMismatch<K:Collection where K.Index == Index, K.Element == Self.Element>(with other: K, at index: Index) -> Bool { 
>       return self[index] != other[index]
>     }
>
>   }
>
> …(you would’t write the above anyway, but it illustrates the kind of
> "generic context" I had in mind when I wrote it).

Yes, I understand that you weren't thinking of the cases where indices
from different collections *do* have a relationship in a generic context
:-).

>>> in a concrete context you naturally have more information.
>>> 
>>> Slices would become problematic, I’ll grant.
>>> 
>>>> var x = [1, 2]
>>>> let i = x.index(1, stepsFrom: x.startIndex)
>>>> x.removeLast()
>>>> x[i]           // fatal error: Index out of range
>>> 
>>> Indices can become invalid; this imposes preconditions. I don’t get
>>> it.
>> 
>> My point is that whether i is at the end or not cannot be encoded in i.
>
> I see the miscommunication, now. Of course you can’t encode that.

How can this be a miscommunication?  Your examples from earlier in the
thread include this:

  enum SaferIndex<T:Comparable> {
  case Position(T)
  case End
  }

which clearly does try to encode that information in the index.

> I’ve put a couple examples down below as a last effort at
> communicating what I’m getting at it.
>
>>>> The converse is also true: subscripting on a collection's endIndex is
>>>> sometimes just fine, even with no mutation in sight.
>>>> 
>>>> let a = (0..<10).reversed()
>>>> print(Array(a))      // “[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]”
>>>> 
>>>> let b = a.prefix(9)
>>>> print(Array(b))      // “[9, 8, 7, 6, 5, 4, 3, 2, 1]”
>>>> 
>>>> print(a[b.endIndex]) // “0” (correct, supported behavior)
>>> 
>>> I believe we are back to “subscripting one collection with *another*
>>> collection's `endIndex`, no?
>> 
>> Totally legit, as mentioned above.  a.prefix(9) returns a slice of a.
>> 
>>> Are there any circumstances where a collection *can* be
>>> usefully-subscripted with its *own* `endIndex`?
>> 
>> var a = [1]
>> let i = a.endIndex
>> a.append(2)
>> print(a[i]) // “2”
>> 
>>>> 
>>>> Of course,
>>>> 
>>>> b[b.endIndex]        // As a matter of QOI: fatal error: out of bounds: index >= endIndex
>>>> 
>>>>> 
>>>>> It would’ve been awkward to do this under the previous status quo—e.g. even for
>>>>> arrays your indices would have to have a back-reference to get the count, and
>>>>> thus couldn’t be plain integers—but the collection will now always be present to
>>>>> provide such info.
>>>>> 
>>>>> Cons:
>>>>> 
>>>>> - more overhead than “bare” indices
>>>>> - doesn’t address invalidation (but what does, really?)
>>>>> 
>>>>> Pros:
>>>>> 
>>>>> - easier in some ways to handle things like e.g 0…Int.max
>>>>> - the endIndex equivalent *never* invalidates 
>>>>> - compile-time help for end-index checking
>>>>> 
>>>>> Overall this *would* bring the treatment of indices closer to that for `?`—e.g.,
>>>>> redefine the core type to omit the `nil`-like value, 
>>>> 
>>>> Sorry, but that's the opposite of what `?` is doing: it *adds* a nil
>>>> value.  
>>> 
>>> …I must have been unclear.
>>> 
>>> Step 1: Define T* = { "all memory addresses” (nil included) }
>>> Step 2: Define T = T* \ { nil } (e.g. "non-null pointers")
>>> 
>>> …is what I was trying to summarize via “redefine the core type to omit
>>> the `nil`-like value” (which is the important part here).
>> 
>> Sorry, that's still unclear to me.  I just don't see what you're getting
>> at.
>> 
>>> Anyways, having `endIndex` directly inhabit the same type as the
>>> “good” indices has some pros and some cons; it’s not an IMHO one-sided
>>> situation as with `nil`.
>> 
>> Maybe, but my point is that many things in the current model are
>> incompatible with the other arrangement.  If you wanted to change the
>> arrangement, you'd need to re-think the current model from the ground
>> up, including index invalidation, how algorithms interact, the
>> relationship of slices to the thing they're sliced from, etc...
>> 
>> So what you're suggesting is an interesting hypothesis, but to me it's
>> not by any means obviously workable.
>
> You’re completely right about slices. I’ll provide a couple concrete examples before addressing the rest.
>
> Here are three collection-combinators (or adapters I think you’d call them):
>
>   // Collection with elements of A, then elements of B.
>   struct ChainCollection<A:Collection,B:Collection> : Collection {
>     let a: A; let b: B;
>   }
>
>   // Collection with elements `(a,b)` for each pair in the cartesian product of `A` and `B`.
>   struct ProductCollection<A:Collection,B:Collection> : Collection {
>     let a: A; let b: B;
>   }
>
>   // Collection with adjacent elements from A
>   struct AdjacentElementCollection<A:Collection> : Collection { 
>     let a: A
>   }
>
> …each of which I’ve declared `: Collection` but each which will still need some suitable `Index` implementation.
>
> Here’s one way to write these indices (henceforth, the V1 indices):
>
>   // `endIndex` will be .InB(b.endIndex)
>   enum ChainCollectionIndex<A:Collection,B:Collection> {
>     // Precondition: the index isn’t `a.endIndex`.
>     case InA(A.Index)
>     case InB(B.Index) 
>   }
>
>   // `endIndex` will have both `.aIndex` and `.bIndex` equal to their “source”'s `endIndex`
>   struct ProductCollectionIndex<A:Collection,B:Collection> {
>     let aIndex: A.Index
>     let bIndex: B.Index
>   }
>
>   // `endIndex` will be both of these set to `a.endIndex`
>   // - all other situations expect `upper` is `lower`’s successor, and both != `endIndex`
>   struct AdjacentElementCollectionIndex<A:Collection> {
>     let lower: A.Index
>     let upper: A.Index
>   }
>
> …(I trust the index-manipulation boilerplate is easy to fill-in).
>
> There’s absolutely nothing wrong with the above! Each of these types
> has the capability to represent the “good” indices, and also has a
> reasonable way to represent `endIndex` values.
>
> But, they could also be written like so (henceforth, the V2 indices):
>
>   enum ChainCollectionIndex<A:Collection,B:Collection> {
>     // Precondition: the index isn’t `a.endIndex`.
>     case InA(A.Index)
>     // Precondition: the index isn’t `b.endIndex`.
>     case InB(B.Index)
>     // `endIndex` sentinel
>     case End
>   }
>
>   enum ProductCollectionIndex<A:Collection,B:Collection> {
>     // Precondition: neither index is the source collection’s `endIndex`
>     case Item(A.Index, B.Index)
>     // `endIndex` sentinel
>     case End
>   }
>
>   enum AdjacentElementCollectionIndex<A:Collection> {
>     // Precondition: `upper` is `lower`’s successor, *both* are != `a.endIndex`
>     case Adjacency(lower: A.Index, upper: A.Index)
>     // `endIndex` sentinel
>     case End
>   }
>
> …each of which is essentially the V1 version, except now with a
> dedicated `endIndex` value tacked-on. 

Again you're encoding “I'm at the end” in the index, which as we've
agreed does not work.

> Tacking on the dedicated `endIndex` isn’t necessary, but at least for
> me taking V2-style approaches has been very advantageous.
>
> Most of the advantage has been from being able to enforce stricter
> invariants on the non-endIndex indices, 

I don't see how anything is lost in terms of the ability to enforce
invariants, if the collection handles index movement.

> which generally makes the code simpler and also easier to reason
> about; I’m also fortunate in that I’m not working on the standard
> library, and thus can choose how heavily to weight “time to correct
> implementation” vis-a-vis “proximity to maximum possible efficiency”.
>
> The above indices are drawn from what are admittedly simple
> combinators/adaptors, but they feel like representative examples for
> me; even for fancier, actually-custom things it’s almost always been
> much more natural to go with a V2-style index (and sometimes no
> natural V1-style approach even exists).
>
> I’ve done a fair amount of experimenting with the model implied
> above—moving `endIndex` into a dedicated sentinel, etc.—and although
> it’s pretty easy overall to translate back and forth between the two
> models, the “alternative” approach *at best* comes out a wash...at
> best.
>
> For the most part, invalidation is about the same between the two
> models for all non-end-index—for such indices, the same mutations
> invalidate the same indices under either approach.
>
> What *is* different between the two is that a cached `endIndex` will never become valid due to mutation, e.g. this won’t happen:
>
>   // status quo:
>   var items = [“a”, “b”]
>   let cachedEndIndex = items.endIndex // this is just `2`
>   items.append[“c”]
>   items[cachedEndIndex] // returns “c"
>
>   // dedicated `endIndex`:
>   var items = [“a”, “b”]
>   let cachedEndIndex = items.endIndex
>   items.append[“c”]
>   items[cachedEndIndex] // goes boom, b/c `cachedEndIndex` is still the `endIndex`

So one special index value “moves automatically as the collection changes” and
all others do not.  Doesn't seem like an advantage to me.  It's easy to
imagine having subtle bugs due to this difference. 

> …and things like this wouldn't work the same way:
>
>   // status quo:
>   var items = [“a”, “b”]
>   let cachedEndIndex = items.endIndex // this is just `2`
>   items.insert(“c”, at:  cachedEndIndex) // [“a”, “b”, “c”]
>   items.insert(“d”, at:  cachedEndIndex) // [“a”, “b”, “d”, “c”]
>
>   // dedicated `endIndex`
>   var items = [“a”, “b”]
>   let cachedEndIndex = items.endIndex // this is `.End`
>   items.insert(“c”, at:  cachedEndIndex) // [“a”, “b”, “c”]
>   items.insert(“d”, at:  cachedEndIndex) // [“a”, “b”, “c", “d”]
>
> …because the end-index sentinel would *always* refer to the logical
> end of the collection.

Yeah, that difference seems like an active problem to me.  

> Slices would truly be problematic here. For the rest, it’s hard to see
> insurmountable difficulties—especially given the ease of converting
> between the two approaches, given access to the collection—but I’d
> expect it to be far clunkier and a tad slower.

Not sure what you're saying would be clunkier.  I can tell you that
having a special-case index state for “past the end” is a great way to
introduce both clunky special-case code to manage the differences in
representation and behavior, and associated inefficiency.

>>> On the one hand, in my own experience so far, it’s definitely been the
>>> case that most custom collections I’d done have had indices that’re
>>> effectively the `SaferIndex` above; it’s been rather rare that there’s
>>> been a natural “1 past the rest” value to use of the same type as is
>>> used to describe the position of a “good” index.
>>> 
>>>> Seriously, just because Swift has Optionals and they're useful for
>>>> safety in some scenarios (compared with allowing everything to be
>>>> nullable) does not mean that it's going to be “Swiftier” to apply a
>>>> similar pattern everywhere.
>>>> 
>>>>> use an enum to reintroduce that value when necessary—than to `!`.
>>>>> 
>>>>> I don’t think the above is an *improvement* over the proposal, but it’s a route
>>>>> that could have been taken.
>>>> 
>>>> I believe it would be hard to make such a design work at all, and if you
>>>> could make it work I think you'd end up with exactly the problem this
>>>> proposal aims to solve: references inside indices.  So, I don't think
>>>> it's even a possibility, really.
>>> 
>>> I can’t say I see the impossibility. I definitely have experienced the
>>> clunkiness.
>>> 
>>> This is getting too involved for a hypothetical I was explaining, but
>>> not advocating.
>> 
>> I will happily agree to drop this topic :-)
>
> It’s dropped, now; I only felt the need to reply one more time b/c I
> could tell I’d previously failed to communicate clearly-enough.

Agh.  I had forgotten my offer to drop this and now I've written this
whole reply, which might have some value so sending.  But I promise I
won't answer again!

>> 
>>> This proposal and the new design is a good design!
>> 
>> Thanks!
>
> I really do mean it! Just look at those examples and think of how many
> redundant back-references they used to need...
>
>> 
>>>>> 
>>>>> 
>>>>>       To help illustrate the claim, here’s a strawman “safe” API—for
>>>>>       illustration
>>>>>       only, not advocacy!—that would be safer and thus perhaps more “Swift-y”:
>>>>> 
>>>>>   I think there's a prevalent misunderstanding (IOW, I don't mean to
>>>>>   single out this post or this poster) about what “safe” means in Swift
>>>>>   and what the features of a Swifty API are and should be. This
>>>>>   is a big topic worthy of much more time than I can devote here, but
>>>>>   here's a thought to start with:
>>>>> 
>>>>>   A Swifty API helps you reason effectively about the correctness of your
>>>>>   code, and in part that means we provide enough preconditions on
>>>>>   arguments to avoid complicating result types, and code to handle
>>>>>   results, with optional-ness.
>>>>> 
>>>>>   -- 
>>>>>   Dave
>>>>> 
>>>>>   _______________________________________________
>>>>>   swift-evolution mailing list
>>>>>   swift-evolution at swift.org
>>>>>   https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> 
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> 
>>>> -- 
>>>> Dave
>>>> 
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
>> -- 
>> Dave
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-- 
Dave



More information about the swift-evolution mailing list