[swift-evolution] [Proposal Draft] Provide Custom Collections for Dictionary Keys and Values

Dave Abrahams dabrahams at apple.com
Thu Oct 13 00:43:56 CDT 2016


on Tue Oct 11 2016, Nate Cook <swift-evolution at swift.org> wrote:

> Introduction
>
> This proposal addresses significant unexpected performance gaps when using dictionaries. It
> introduces type-specific collections for a Dictionary instance's keys and values properties.
>
> New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access
> to dictionary values, enabling updates to be performed in-place and allowing copy-on-write
> optimization of stored values.

Doing this is a good idea for resilience reasons, if nothing else, as it
will allow us to adjust the implementations of these collections without
breaking clients.

>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation
>
> This proposal address two problems:
>
> The Dictionary type keys implementation is inefficient, because
> LazyMapCollection doesn't know how to forward lookups to the
> underlying dictionary storage.

I don't see why traversing a collection of keys should require
“lookups,” at least in the normal meaning of the word, which refers to
locating a hash entry given a key.

> Dictionaries do not offer value-mutating APIs. 

Subscript is a value-mutating API:

  var b = [1:"one"]
  b[1]! += "!"
  print(b[1]) // "one!"
  
> The mutating key-based subscript wraps values in an Optional. This
> prevents types with copy-on-write optimizations from recognizing they
> are singly referenced.  

Yes, that's an efficiency problem.  I'm not making promises here, but I
have good reasons to think we'll address this issue by fixing the way
inout works.  Specifically, we'd prevent (statically where possible)
aliased accesses to references that are participating in inout access
and thus be able to avoid incrementing their reference counts in these
scenarios.

> This proposal uses the following [String: [Int]] dictionary to
> demonstrate these problems:
>
> var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient
> dict.keys Search
>
> Swift coders normally test key membership using nil checks or underscored optional bindings:
>
> if dict["one"] != nil {
>     // ...
> }
> if let _ = dict["one"] {
>     // ...
> }
> These approaches provide the expected performance of a dictionary lookup but they read neither well
> nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this
> approach requires a linear search through a dictionary's keys to find a match.
>
> if dict.keys.contains("one") {
>     // ...
> }

Heh.  dict.keys could get you a Set<Key>

> A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).
>
>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
> Value Mutation
>
> Dictionary values can be modified through the keyed subscript by
> direct reassignment or by using optional chaining. Both of these
> statements append 1 to the array stored by the key "one":
>
> // Direct re-assignment
> dict["one"] = (dict["one"] ?? []) + [1]
>
> // Optional chaining
> dict["one"]?.append(1)
>
> Both approaches present problems. The first is complex and hard to
> read. The second ignores the case where "one" is not a key in the
> dictionary. 

IMO the right way to phrase this is that the semantics of the first
approach are much more commonly useful than those of the second.

> It forces its check into a higher branch and encourages forced
> unwrapping. 

I don't think that's a valid argument against anything.  There's nothing
wrong with forced unwrapping.

> Furthermore, neither approach allows the array to grow in place. They
> introduce an unnecessary copy of the array's contents even though dict
> is the sole holder of its storage.

Yes.  Here's the horrible-but-reasonably-efficient way to do this today:

 var x: [Int]? = []
 swap(&x, &dict["one"])
 x = x ?? Array(minimumCapacity: 1)
 x!.append(1)
 swap(&x, &dict["one"])

> Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at
> a particular index would almost certainly modify its hash value, rendering the index incorrect. This
> violates the requirements of the MutableCollection protocol.
>
>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
> Solution
>
> This proposal adds a custom collection for the keys and values
> dictionary properties. This follows the example set by String, which
> presents multiple views of its contents. A new DictionaryKeys
> collection introduces efficient key lookup, while a new
> DictionaryValues collection provides a mutable collection interface to
> dictionary values.
>
> These changes introduce a simple and efficient way of checking whether
> a dictionary includes a key:
>
> // Performant
> if dict.keys.contains("one") {
>     // ...
> }
>
> As a mutable collection, values enables modification without copies or
> clumsy code:
>
> if let i = dict.index(forKey: "one") {
>     dict.values[i].append(1)  // no copy here
> } else {
>     dict["one"] = [1]
> }

Well, I find that a bit clumsy still, but it's better than what we've
got.  There's no reason dict.values shoudn't be a MutableCollection.

> Both the keys and values collections share the same index type as
> Dictionary. This allows the above sample to be rewritten as:
>
> // Using `dict.keys.index(of:)`
> if let i = dict.keys.index(of: "one") {
>     dict.values[i].append(1)
> } else {
>     dict["one"] = [1]
> }
>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
> design

That is a good point.  It makes me inclined to deprecate index(forKey:)

> The standard library introduces two new collection types:
> DictionaryKeys and DictionaryValues.  A Dictionary's keys and values
> property types change from LazyMapCollection to these new types.  The
> new collection types are not directly constructable. They are
> presented only as views into a dictionary.
>
> struct Dictionary<Key: Hashable, Value>: ... {
>     var keys: DictionaryKeys<Key, Value> { get }
>     var values: DictionaryValues<Key, Value> { get set }
>
>     // Remaining declarations
> }
>
> /// A collection view of a dictionary's keys.
> struct DictionaryKeys<Key: Hashable, Value>: Collection {
>     typealias Index = DictionaryIndex<Key, Value>
>     subscript(i: Index) -> Key { get }
>
>     // Other `Collection` requirements
> }
>
> /// A mutable collection view of a dictionary's values.
> struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
>     typealias Index = DictionaryIndex<Key, Value>
>     subscript(i: Index) -> Value { get set }
>
>     // Other `Collection` requirements
> }
>
> A sample implementation of this proposal can be found in this branch
> <https://github.com/natecook1000/swift/tree/nc-dictionary>.
>
>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
> on existing code
>
> The performance improvements of using the new DictionaryKeys type and
> the mutability of the DictionaryValuescollection are both additive in
> nature.
>
> Most uses of these properties are transitory in nature. Adopting this
> proposal should not produce a major impact on existing code. The only
> impact on existing code exists where a program explicitly specifies
> the type of a dictionary's keysor values property. The fix is to
> change the specified type.
>
>  <https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
> considered
>
> The Generics Manifesto
> <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md>
> lists nested generics as a goal. This could impact the naming and
> structure of these new collection types.
>
> Instead of DictionaryKeys<Key, Value> and 
> DictionaryValues<Key, Value>, these types 
> could be Dictionary<Key, Value>.Keys and
> Dictionary<Key, Value>.Values. However, because many types in the
> standard library may be revisited once such a feature is available
> (indices, iterators, etc.), the current lack of nesting shouldn't
> prevent consideration of this proposal.
>
> It could be possible to add additional compiler features that manage
> mutation through existing key-based subscripting without the
> copy-on-write problems of the current implementation. I don't know
> enough about how that would be implemented to speak to its feasibility
> or level of effort. Such a feature would reduce the need for a mutable
> DictionaryValues collection.

-- 
-Dave



More information about the swift-evolution mailing list