[swift-evolution] Add ability to validate collection indices
Alexey Komnin
interfere.work at gmail.com
Fri Dec 23 12:32:27 CST 2016
>>> Because I don't see any way to use it in correct code. At least, none
>>> that can't be more-usefully and reliably provided by returning more
>>> index information from mutating methods.
>>
>> Ok. I agree. All the indices must be invalidated if the collection
>> object is mutated.
>
> I didn't say that, and in the general case, I don't agree.
Sorry, I misunderstood your point here before. Now I see what you were
talking about. Thanks for the clarification.
> The best way to do it would be to write the evolution proposal that lays
> out the general index invalidation rules and the guarantees we should
> give for the various data structures, explaining why those guarantees
> are the right ones. If we can get the proposal passed, the issue should
> be totally clear for everyone :-)
Yes, I think the same way. I want to end up with a proposal, but I
have no insight into existing plans about evolution of Collection
protocol. That's why I decided to start from the question to mailing
list.
> Collection indices really must be stable when the collection is sorted, for
> example. It is only structure-changing operations that can invalidate
> indices, and even then that's only true for some collections and some
> operations.
Yes, it's a subtle thing I was talking about. Indices are not stable
if we are sorting a Set, for example. Therefore, we have a set of
collections keeping indices safe after sorting and a set of
collections which don't.
Let's take a look at very simple collection, or, as common example, a
set of well-known collections. Here is (incomplete) list:
* Array
* Forward List (single-linked)
* Set
What is a notion of an index for each of them? It's pretty obvious
what is the index when we are talking about Array. But what is the
index when we are talking about Set? Would it be different for Hashed
set (current default implementation in Swift stdlib) and Tree-based
set? What if we make it using skip-list as a base collection?
Forward list is a pretty simple data structure. Is it a collection?
Yes, indeed! But does it have an index? I don't know, because notion
of an index is unclear for me. We could define index for an element in
the list as a number of strides from the head node. Hence the index
is of integer type.
The same notion is applicable to all collections. We could define an
index for a Set as a number of iterator steps from the beginning to
the node with an element. But it does kill all performance.
> We should document the minimum guarantees provided
> by all Collections and also any stronger guarantees provided by specific
> Collections.
I have tried to sum up all I knew about different collections:
* Unordered collections, based on hash function, may invalidate any
number of indices during any mutation, because:
a) any mutation may cause rehashing,
b) current implementation may invalidate indices of elements with
collided hashValue;
* Random-access collections like Array, or double-ended queue, may
invalidate indices only during removal operation. Note, that
invalidated index (or indices) may point to other element;
* Forward and double-linked lists invalidate only index of deleted element;
* Tree-based collections may or may not invalidate indices due to
insert operation, depending on the internal algorithm of any
particular collection;
* Some complex collections, like LRU or LFU cache may invalidate
indices during get subscript operation.
There is no any common way as well as no common meaning of an index.
So, the possible minimum guarantee: any particular index may be
invalidated as a result of any operation.
If we drop complex containers out of Collections list, we could
restrict the rule: any particular index may be invalidated as a result
of any mutating operation.
Restriction for RandomAccessCollection: all indices in range
(startIndex..<endIndex) are and remain valid during collection
lifetime.
Also there is a bunch of non-trivial rules for any particular
collection. Now, there are like three or four collections in stdlib.
Is there any more in future plans?
- Alex Komnin
P.S. Sorry, if I am acting little nosey, but I am in the middle of
implementing some additional collections in Swift for my project, and
I would prefer to make them as close to canonical stdlib
implementation as it possible.
On Tue, Dec 20, 2016 at 4:01 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>
> on Tue Dec 20 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
>
>>> Because I don't see any way to use it in correct code. At least, none
>>> that can't be more-usefully and reliably provided by returning more
>>> index information from mutating methods.
>>
>> Ok. I agree. All the indices must be invalidated if the collection
>> object is mutated.
>
> I didn't say that, and in the general case, I don't agree. Collection
> indices really must be stable when the collection is sorted, for
> example. It is only structure-changing operations that can invalidate
> indices, and even then that's only true for some collections and some
> operations.
>
> Note that this means that the copy-on-write step for
> conceptually-non-structure-changing mutations may not be allowed to
> change the data structure's physical structure, if that would require
> invalidating indices.
>
>> I just thought about making it straight and obvious how to check if
>> index is still valid for simple collections like Array. Returning
>> additional index information from mutating methods is a subtle thing,
>> because some methods do not change the object layout, for example
>> append does not change validity of indices;
>
> It certainly could. Conceptually, in its most general form, a
> Collection is some kind of tree (which must be of limited depth to avoid
> violating big-O expectations, but memory limitations pretty much take
> care of that---yes, this requires a small amount of handwaving). In its
> most general form, an index describes a path through that tree.
> Appending to an in-memory B-tree could require the allocation of a new
> root node, which changes the paths to all elements in the tree, thereby
> invalidating all indices.
>
> To avoid this effect we'd either have to require the tree to become
> unbalanced, which is too great a big-O killer, or that it be a
> RandomAccessCollection that could be indexed by Ints, which in itself
> could hurt the performance of linear traversals.
>
>> other methods may invalidate a lot or, even, all indices, for example,
>> replaceSubrange.
>>
>> I just want to understand the purpose of indices and what behavior is
>> intended for them. It's unclear for me now.
>
> I hope the above helps.
>
>> I just see a bunch of ways to shoot myself in the foot and looking for
>> a way to change that disposition.
>
> The best way to do it would be to write the evolution proposal that lays
> out the general index invalidation rules and the guarantees we should
> give for the various data structures, explaining why those guarantees
> are the right ones. If we can get the proposal passed, the issue should
> be totally clear for everyone :-)
>
>> As I see it, the problem is: there is no way to find if index is valid
>> for collection. Obvious way of checking if index is within the
>> (startIndex..<endIndex) bounds does not work.
>> So, I know only three possible solutions to the problem:
>>
>> 1. provide a way to check if index is valid;
>
>
> There's always c.indices.contains(i), though it's potentially very
> inefficient and may not tell you what you want to know (at least, not if
> you want to know the index points to the same element you thought it
> did).
>
> But the bigger question is, what would your program do differently if it
> discovered that an index it wanted to use was invalid with respect to a
> collection?
>
>> 2. state in documentation that all indices of any collection become
>> invalid after each mutation;
>
> That's too broad; as noted above, index invalidation only relates to
> structural mutation. We should document the minimum guarantees provided
> by all Collections and also any stronger guarantees provided by specific
> Collections.
>
>> 3. state in documentation that accessing elements of collection via
>> index after mutation is unspecified behavior.
>
> Definitely not; you wouldn't be able to write any interesting generic
> mutating algorithms if that were the case.
>
>> Am I right? Did I miss something?
>>
>> I would prefer option #1, if there are no other feasible solutions to
>> the problem.
>>
>> Thanks.
>> - Alex Komnin
>>
>> P.S. I just noticed that our discussion fell out of swift-evolution
>> list. Added it back.
>>
>> On Mon, Dec 19, 2016 at 8:30 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>>
>>> on Mon Dec 19 2016, Alexey Komnin <interfere.work-AT-gmail.com> wrote:
>>>
>>>>> I don't personally agree with #3, FWIW
>>>> Why? I thought it may be useful.
>>>
>>> Because I don't see any way to use it in correct code. At least, none
>>> that can't be more-usefully and reliably provided by returning more
>>> index information from mutating methods.
>>>
>>>> - Alex Komnin
>>>>
>>>> On Mon, Dec 19, 2016 at 4:20 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>>>>> I don't personally agree with #3, FWIW
>>>>>
>>>>>
>>>>>
>>>>>> On Dec 19, 2016, at 3:01 AM, Alexey Komnin <interfere.work at gmail.com> wrote:
>>>>>>
>>>>>> Ok. Through the discussion I've come to the following:
>>>>>>
>>>>>> 1. collection invalidates all indices on any mutation as a common solution;
>>>>>> 2. collection may keep some indices valid if it is feasible;
>>>>>> 3. there are should be a way to check if index is valid;
>>>>>> 4. mutating methods should return index of inserted/replaced element.
>>>>>>
>>>>>> Did I miss anything?
>>>>>>
>>>>>> - Alex Komnin
>>>>>>
>>>>>> On Mon, Dec 19, 2016 at 6:10 AM, Dave Abrahams via swift-evolution
>>>>>> <swift-evolution at swift.org> wrote:
>>>>>>>
>>>>>>>> on Fri Dec 16 2016, Daniel Vollmer <swift-evolution at swift.org> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>> On 16 Dec 2016, at 14:56, Alexey Komnin via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>>>
>>>>>>>> [snip]
>>>>>>>>
>>>>>>>>> What do you think? I feel, like we should discuss it before I
>>>>>>>>> formalize it as a proposal.
>>>>>>>>
>>>>>>>> I think this is a fruitless attempt, as even if the indices are still valid,
>>>>>>>> they may not refer to the same elements they referenced before the mutation.
>>>>>>>>
>>>>>>>> Of course, mutating methods should state whether the invalidate existing
>>>>>>>> indices, but I think that’s about the best that can be reasonably
>>>>>>>> done.
>>>>>>>
>>>>>>> We can do a bit more. For example, RangeReplaceableCollection's
>>>>>>> replaceRange should return the range in the new collection state
>>>>>>> corresponding to the elements that were replaced.
>>>>>>>
>>>>>>> --
>>>>>>> -Dave
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> swift-evolution mailing list
>>>>>>> swift-evolution at swift.org
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>> --
>>> -Dave
>
> --
> -Dave
More information about the swift-evolution
mailing list