[swift-evolution] Add ability to validate collection indices

Dave Abrahams dabrahams at apple.com
Fri Dec 23 22:25:42 CST 2016


on Fri Dec 23 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:

>>>> Because I don't see any way to use it in correct code.  At least, none
>>>> that can't be more-usefully and reliably provided by returning more
>>>> index information from mutating methods.
>>>
>>> Ok. I agree. All the indices must be invalidated if the collection
>>> object is mutated.
>>
>> I didn't say that, and in the general case, I don't agree.
>
> Sorry, I misunderstood your point here before. Now I see what you were
> talking about. Thanks for the clarification.
>
>> The best way to do it would be to write the evolution proposal that lays
>> out the general index invalidation rules and the guarantees we should
>> give for the various data structures, explaining why those guarantees
>> are the right ones.  If we can get the proposal passed, the issue should
>> be totally clear for everyone :-)
>
> Yes, I think the same way. I want to end up with a proposal, but I
> have no insight into existing plans about evolution of Collection
> protocol. That's why I decided to start from the question to mailing
> list.

Speaking for myself, I have no plans to change the fundamental model
with respect to index invalidation.  We reworked indexing for Swift 3
and I think we've settled in a pretty good place.

I think, to complete the Collection model for performance purposes, we
need to add (something like) these requirements:

  associatedtype Segments: Collection 
    where Segments.Iterator.Element : Collection,
      Segments.Iterator.Elements.Iterator.Element == Iterator.Element

  var segments: Segments? { get }

(for more information on that, see http://lafstern.org/matt/segmented.pdf)

But as I say, those changes shouldn't really impact index invalidation.

>> Collection indices really must be stable when the collection is sorted, for
>> example.  It is only structure-changing operations that can invalidate
>> indices, and even then that's only true for some collections and some
>> operations.
>
> Yes, it's a subtle thing I was talking about. Indices are not stable
> if we are sorting a Set, for example. 

You can't sort a set :-)

> Therefore, we have a set of collections keeping indices safe after
> sorting and a set of collections which don't.
>
> Let's take a look at very simple collection, or, as common example, a
> set of well-known collections. Here is (incomplete) list:
>  * Array
>  * Forward List (single-linked)
>  * Set
>
> What is a notion of an index for each of them? It's pretty obvious
> what is the index when we are talking about Array. But what is the
> index when we are talking about Set? Would it be different for Hashed
> set (current default implementation in Swift stdlib) and Tree-based
> set? 
> What if we make it using skip-list as a base collection?


As I've said in text you quoted below, in its most general form, an
index describes a path through a tree.  

> Forward list is a pretty simple data structure. Is it a collection?
> Yes, indeed! 

Sure, informally it's a collection.  Whether or not it conforms to
`Collection` depends on how you design it.

> But does it have an index? 

The simplest index would be a pointer to a node in the list.  However
note that such a list couldn't have value semantics.

> I don't know, because notion of an index is unclear for me. We could
> define index for an element in the list as a number of strides from
> the head node. Hence the index is of integer type.

No, we couldn't, because subscripting is required to be O(1).

> The same notion is applicable to all collections. We could define an
> index for a Set as a number of iterator steps from the beginning to
> the node with an element. But it does kill all performance.
>
>> We should document the minimum guarantees provided
>> by all Collections and also any stronger guarantees provided by specific
>> Collections.
>
> I have tried to sum up all I knew about different collections:
>
> * Unordered collections, based on hash function, may invalidate any
> number of indices during any mutation, because:
>     a) any mutation may cause rehashing,


No; changing the value associated with a dictionary key doesn't
invalidate anything.

>     b) current implementation may invalidate indices of elements with
> collided hashValue;
> * Random-access collections like Array, or double-ended queue, may
> invalidate indices only during removal operation. 

It may really depend how you represent indices.  A deque can be more
efficient with indices that consist of a pair of pointers (or a pair of
offsets) than with a single integer.  I've seen it implemented with a
circular buffer for the root of the “tree.”  Different implementation
choices can result in different invalidation behaviors.  The question
is, which implementation choices are worth accomodating?

> Note, that invalidated index (or indices) may point to other element;
> * Forward and double-linked lists invalidate only index of deleted element;

This is in general true of data structures that store one element per
allocation (e.g. a red-black tree).  Note that in redesigning the
indexing model we intentionally decided that the most important data
structures to accomodate all have some contiguous storage (e.g. a
B-tree) and that the specific goal of fitting linked lists into the
model could be sacrificed if necessary to acheive other goals.  That's
why, for example, I'm pretty sure, you can't build a linked list
Collection with value semantics.

> * Tree-based collections may or may not invalidate indices due to
> insert operation, depending on the internal algorithm of any
> particular collection;
> * Some complex collections, like LRU or LFU cache may invalidate
> indices during get subscript operation.

No, we can't have read operations invalidating indices.  No generic
algorithms would work reliably on such a “Collection”.  One might as
well not make the data structure conform to the protocol.

>
>
> There is no any common way as well as no common meaning of an index.

There *is* a common meaning of an index.  It is a position in the data
structure.  The rest of its meaning is given by the fact that
Collections are multipass and the indices of all elements are reachable
from startIndex by applying index(after:) zero or more times.

> So, the possible minimum guarantee: any particular index may be
> invalidated as a result of any operation.
> If we drop complex containers out of Collections list, we could
> restrict the rule: any particular index may be invalidated as a result
> of any mutating operation.

That's not strong enough.  Any MutableCollection can be sorted.  Sorting
wouldn't work if indices were invalidated by the assignment operations
used in sorting.

> Restriction for RandomAccessCollection: all indices in range
> (startIndex..<endIndex) are and remain valid during collection
> lifetime.
>
> Also there is a bunch of non-trivial rules for any particular
> collection. Now, there are like three or four collections in stdlib.
> Is there any more in future plans?

Yes, we anticipate adding more some day.  But more importantly, the
Collection protocols are designed to cover a range of plausible data
structures both inside and outside the standard library, and the index
invalidation rules must both:

1. allow important algorithms—in the standard library (such as sort),
   and those that may be written by others—to work, and

2. be implementable for a wide range of important concrete collection
   models.

>
>
> - Alex Komnin
>
> P.S. Sorry, if I am acting little nosey, but I am in the middle of
> implementing some additional collections in Swift for my project, and
> I would prefer to make them as close to canonical stdlib
> implementation as it possible.
>
> On Tue, Dec 20, 2016 at 4:01 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>
>> on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
>>
>>>> Because I don't see any way to use it in correct code.  At least, none
>>>> that can't be more-usefully and reliably provided by returning more
>>>> index information from mutating methods.
>>>
>>> Ok. I agree. All the indices must be invalidated if the collection
>>> object is mutated.
>>
>> I didn't say that, and in the general case, I don't agree.  Collection
>> indices really must be stable when the collection is sorted, for
>> example.  It is only structure-changing operations that can invalidate
>> indices, and even then that's only true for some collections and some
>> operations.
>>
>> Note that this means that the copy-on-write step for
>> conceptually-non-structure-changing mutations may not be allowed to
>> change the data structure's physical structure, if that would require
>> invalidating indices.
>>
>>> I just thought about making it straight and obvious how to check if
>>> index is still valid for simple collections like Array. Returning
>>> additional index information from mutating methods is a subtle thing,
>>> because some methods do not change the object layout, for example
>>> append does not change validity of indices;
>>
>> It certainly could.  Conceptually, in its most general form, a
>> Collection is some kind of tree (which must be of limited depth to avoid
>> violating big-O expectations, but memory limitations pretty much take
>> care of that---yes, this requires a small amount of handwaving).  In its
>> most general form, an index describes a path through that tree.
>> Appending to an in-memory B-tree could require the allocation of a new
>> root node, which changes the paths to all elements in the tree, thereby
>> invalidating all indices.
>>
>> To avoid this effect we'd either have to require the tree to become
>> unbalanced, which is too great a big-O killer, or that it be a
>> RandomAccessCollection that could be indexed by Ints, which in itself
>> could hurt the performance of linear traversals.
>>
>>> other methods may invalidate a lot or, even, all indices, for example,
>>> replaceSubrange.
>>>
>>> I just want to understand the purpose of indices and what behavior is
>>> intended for them. It's unclear for me now.
>>
>> I hope the above helps.
>>
>>> I just see a bunch of ways to shoot myself in the foot and looking for
>>> a way to change that disposition.
>>
>> The best way to do it would be to write the evolution proposal that lays
>> out the general index invalidation rules and the guarantees we should
>> give for the various data structures, explaining why those guarantees
>> are the right ones.  If we can get the proposal passed, the issue should
>> be totally clear for everyone :-)
>>
>>> As I see it, the problem is: there is no way to find if index is valid
>>> for collection. Obvious way of checking if index is within the
>>> (startIndex..<endIndex) bounds does not work.
>>> So, I know only three possible solutions to the problem:
>>>
>>> 1. provide a way to check if index is valid;
>>
>>
>> There's always c.indices.contains(i), though it's potentially very
>> inefficient and may not tell you what you want to know (at least, not if
>> you want to know the index points to the same element you thought it
>> did).
>>
>> But the bigger question is, what would your program do differently if it
>> discovered that an index it wanted to use was invalid with respect to a
>> collection?
>>
>>> 2. state in documentation that all indices of any collection become
>>> invalid after each mutation;
>>
>> That's too broad; as noted above, index invalidation only relates to
>> structural mutation.  We should document the minimum guarantees provided
>> by all Collections and also any stronger guarantees provided by specific
>> Collections.
>>
>>> 3. state in documentation that accessing elements of collection via
>>> index after mutation is unspecified behavior.
>>
>> Definitely not; you wouldn't be able to write any interesting generic
>> mutating algorithms if that were the case.
>>
>>> Am I right? Did I miss something?
>>>
>>> I would prefer option #1, if there are no other feasible solutions to
>>> the problem.
>>>
>>> Thanks.
>>> - Alex Komnin
>>>
>>> P.S. I just noticed that our discussion fell out of swift-evolution
>>> list. Added it back.
>>>
>>> On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>>>
>>>> on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
>>>>
>>>>>> I don't personally agree with #3, FWIW
>>>>> Why? I thought it may be useful.
>>>>
>>>> Because I don't see any way to use it in correct code.  At least, none
>>>> that can't be more-usefully and reliably provided by returning more
>>>> index information from mutating methods.
>>>>
>>>>> - Alex Komnin
>>>>>
>>>>> On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>>>>> I don't personally agree with #3, FWIW
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work at gmail.com> wrote:
>>>>>>>
>>>>>>> Ok. Through the discussion I've come to the following:
>>>>>>>
>>>>>>> 1. collection invalidates all indices on any mutation as a common solution;
>>>>>>> 2. collection may keep some indices valid if it is feasible;
>>>>>>> 3. there are should be a way to check if index is valid;
>>>>>>> 4. mutating methods should return index of inserted/replaced element.
>>>>>>>
>>>>>>> Did I miss anything?
>>>>>>>
>>>>>>> - Alex Komnin
>>>>>>>
>>>>>>> On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution
>>>>>>> <swift-evolution at swift.org> wrote:
>>>>>>>>
>>>>>>>>> on Fri Dec 16 2016, Daniel Vollmer <swift-evolution at swift.org> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>>>>
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>>> What do you think? I feel, like we should discuss it before I
>>>>>>>>>> formalize it as a proposal.
>>>>>>>>>
>>>>>>>>> I think this is a fruitless attempt, as even if the indices are still valid,
>>>>>>>>> they may not refer to the same elements they referenced before the mutation.
>>>>>>>>>
>>>>>>>>> Of course, mutating methods should state whether the invalidate existing
>>>>>>>>> indices, but I think that’s about the best that can be reasonably
>>>>>>>>> done.
>>>>>>>>
>>>>>>>> We can do a bit more.  For example, RangeReplaceableCollection's
>>>>>>>> replaceRange should return the range in the new collection state
>>>>>>>> corresponding to the elements that were replaced.
>>>>>>>>
>>>>>>>> --
>>>>>>>> -Dave
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> swift-evolution mailing list
>>>>>>>> swift-evolution at swift.org
>>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>> --
>>>> -Dave
>>
>> --
>> -Dave

-- 
-Dave


More information about the swift-evolution mailing list