[swift-users] Problem with mutable views and COW

Fri Nov 18 17:10:14 CST 2016

> On 18 Nov 2016, at 20:18, John McCall <rjmccall at apple.com> wrote:
> 
>> 
>> On Nov 18, 2016, at 7:40 AM, Karl <razielim at gmail.com <mailto:razielim at gmail.com>> wrote:
>> 
>> 
>>> On 18 Nov 2016, at 13:05, Adrian Zubarev via swift-users <swift-users at swift.org <mailto:swift-users at swift.org>> wrote:
>>> 
>>> Hi there,
>>> 
>>> I just can’t get my head around mutable views and COW.
>>> 
>>> Here is a small example:
>>> 
>>> final class Storage {
>>>      
>>>     var keys: [String] = []
>>>     var values: [Int] = []
>>> }
>>> 
>>> public struct Document {
>>>      
>>>     var _storageReference: Storage
>>>      
>>>     public init() {
>>>          
>>>         self._storageReference = Storage()
>>>     }
>>>      
>>>     public init(_ values: DocumentValues) {
>>>          
>>>         self._storageReference = values._storageReference
>>>     }
>>>      
>>>     public var values: DocumentValues {
>>>          
>>>         get { return DocumentValues(self) }
>>>          
>>>         set { self = Document(newValue) }
>>>     }
>>> }
>>> 
>>> public struct DocumentValues : MutableCollection {
>>>      
>>>     unowned var _storageReference: Storage
>>>      
>>>     init(_ document: Document) {
>>>          
>>>         self._storageReference = document._storageReference
>>>     }
>>>      
>>>     public var startIndex: Int {
>>>          
>>>         return self._storageReference.keys.startIndex
>>>     }
>>>      
>>>     public var endIndex: Int {
>>>          
>>>         return self._storageReference.keys.endIndex
>>>     }
>>>      
>>>     public func index(after i: Int) -> Int {
>>>          
>>>         return self._storageReference.keys.index(after: i)
>>>     }
>>>      
>>>     public subscript(position: Int) -> Int {
>>>          
>>>         get { return _storageReference.values[position] }
>>>          
>>>         set { self._storageReference.values[position] = newValue } // That will break COW
>>>     }
>>> }
>>> First of all the _storageReference property is unowned because I wanted to check the following:
>>> 
>>> var document = Document()
>>> 
>>> print(CFGetRetainCount(document._storageReference)) //=> 2
>>> print(isKnownUniquelyReferenced(&document._storageReference)) // true
>>> 
>>> var values = document.values
>>> 
>>> print(CFGetRetainCount(values._storageReference)) //=> 2
>>> print(isKnownUniquelyReferenced(&values._storageReference)) // false
>>> Why is the second check false, even if the property is marked as unowned for the view?
>>> 
>>> Next up, I don’t have an idea how to correctly COW optimize this view. Assume the following scenario:
>>> 
>>> Scenario A:
>>> 
>>> var document = Document()
>>> 
>>> // just assume we already added some values and can mutate safely on a given index
>>> // mutation in place
>>> document.values[0] = 10  
>>> VS:
>>> 
>>> Scenario B:
>>> 
>>> var document = Document()
>>> 
>>> let copy = document
>>> 
>>> // just assume we already added some values and can mutate safely on a given index
>>> // mutation in place
>>> document.values[0] = 10 // <--- this should only mutate `document` but not `copy`
>>> We could change the subscript setter on the mutable view like this:
>>> 
>>> set {
>>>              
>>>     if !isKnownUniquelyReferenced(&self._storageReference) {
>>>                  
>>>         self._storageReference = ... // clone
>>>     }
>>>     self._storageReference.values[position] = newValue
>>> }
>>> There is only one problem here. We’d end up cloning the storage every time, because as shown in the very first example, even with unowned the function isKnownUniquelyReferenced will return false for scenario A.
>>> 
>>> Any suggestions? 
>>> 
>>> PS: In general I also wouldn’t want to use unowned because the view should be able to outlive it’s parent.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Adrian Zubarev
>>> Sent with Airmail
>>> 
>>> _______________________________________________
>>> swift-users mailing list
>>> swift-users at swift.org <mailto:swift-users at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-users <https://lists.swift.org/mailman/listinfo/swift-users>
>> 
>> 
>> This is kind of an invalid/unsafe design IMO; DocumentValues may escape the scope of the Document and the underlying storage may be deallocated.
>> 
>> Instead, I’d recommend a function:
>> 
>> func withDocumentValues<T>(_ invoke: (inout DocumentValues)->T) -> T {
>> 	var view = DocumentValues(self)
>>         defer { _fixLifetime(view) }
>>         return invoke(&view)
>> }
>> 
>> (unfortunately, this isn’t completely safe because somebody could still copy the DocumentValues from their closure, the same way you can copy the pointer from String’s withCString, but that’s a limitation of Swift right now)
>> 
>> CC: John McCall, because I read his suggestion in the thread about contiguous memory/borrowing that we could have a generalised @noescape. In this example, you would want the DocumentValues parameter in the closure to be @noescape.
> 
> I think you guys understand this stuff, but let me talk through it, and I hope it will be illuminating about where we're thinking of taking the language.
> 
> In value semantics, you expect something like:
>   let values = document.values
> to produce an independent value, and mutations of it shouldn't affect the original document value.
> 
> But there is a situation where values aren't independent, which is when one value is just a projected component of another.  In Swift, this is (currently, at least) always expressed with properties and subscripts. So when you write:
>   document.values.mutateInSomeWay()
> this is expected to actually change the document.  So it makes language sense for views like "values" to be expressed in this way; the only question is whether that can be done efficiently while still providing a satisfactory level of safety etc.
> 
> When a property is actually stored directly in a value, Swift allows direct access to it (although for subscripts this mechanism is not currently documented + exposed, intentionally).  This sort of direct access is optimal, but it's not general enough for use cases like views and slices because the slice value doesn't actually exist anywhere; it needs to be created.  We do allow properties to be defined with get / set, but there are problems with that, which are exactly what you're seeing: slice values need to assert ownership of the underlying data if they're going to be used as independent values, but they also need to not assert ownership so that they don't interfere with copy-on-write.  get / set isn't good enough for this because get is used to both derive an independent value (which should assert ownership) and initiate a mutation (which should not).  The obvious solution is to allow a third accessor to be provided which is used when a value is mutated, as opposed to just copied (get) or overwritten whole-sale (set).  We're still working out various ideas for how this will look at the language level.
> 
> So that would be sufficient to allow DocumentValues to store either a stong or an unowned reference to the storage, depending on how the property is being used.  However, that creates the problem that, like with Karl's solution, the value can be copied during the mutation, and the user would expect that to create an independent value, i.e. to promote an unowned reference to strong.  The most general solution for this is to provide some sort of "copy constructor" feature which would be used to create an independent value.  But that's a pretty large hammer to pull out for this nail.
> 
> A third problem is that the original document can be copied and/or mutated during the projection of the DocumentValues, leaving the copy / the view in a potentially inconsistent state.  But this is a problem that we expect to thoroughly solve with the ownership system, which will statically (or dynamically when necessary) prevent simultaneous conflicting accesses to a value.
> 
> In the meantime, I think the best alternative is to
>   - allow the view to hold either an unowned or owned reference and
>   - create a callback-based accessor like Karl's and document that copies from the value are not permitted
> 
> On a purely technical level:
>>> print(isKnownUniquelyReferenced(&values._storageReference)) // false
>>> Why is the second check false, even if the property is marked as unowned for the view?
>>> 
> 
> A function taking an "inout T" expects to be passed an l-value for an ordinary (strong) reference.  Swift makes this work when passing an unowned or weak reference by passing a temporary variable holding a temporarily-promoted strong reference.  That's usually good, but it's wrong for isKnownUniquelyReferenced, and even more unfortunately, I don't think there's any supported way to make this work in the current compiler; you need language support.
> 
> John.

That’s very illuminating. 

For pure ‘views’, I would actually approach it in a different way (and I wrote a bit about this a while back on the lists): those additional views should not be separate, concrete types with pointers back to the data; they should be protocols on a single type which owns the data. This would simplify situations like String’s various lazily-encoding UTF(8/16/32) views and make it easier to write generic code.

In C, we had one big global namespace. With C++, types could contain function members and became their own namespaces. I believe protocols in Swift should be separate namespaces below their types, and you should have unlimited freedom to name protocol members how you like, as well as conform to multiple overlapping protocols. For example, in my dreams String would be implemented something like this (just UTF8/16, for brevity):

protocol UTF8Sequence : Sequence where Element = UTF8CodePoint {}
protocol UTF16Sequence : Sequence where Element = UTF16CodePoint {}

struct String { /* ... */ }

extension String : UTF8Sequence {
    struct Iterator { /* UTF8-encoding iterator */ }
    func makeIterator() -> Iterator { return Iterator(_core) }
}

extension String : UTF16Sequence {
    struct Iterator { /* UTF16-encoding iterator */ }
    func makeIterator() -> Iterator { return Iterator(_core) }
}

type(of: “hello”.UTF8Sequence.makeIterator())  // String.UTF8Sequence.Iterator
type(of: “hello”.UTF16Sequence.makeIterator()) // String.UTF16Sequence.Iterator

The members which are bound to protocols would have that protocol mangled in to its name; so we’d have: String.UTF8Sequence.makeIterator() -> String.UTF8Sequence.Iterator.
This is only possible because each conformance to UTF8Sequence lives in a parallel universe with its own version of Sequence and any other inherited protocols; String.UTF8Sequence actually is the protocol witness table itself, and we are explicitly building up all of its requirements in its own little bubble. This allows us to do some pretty neat things, like represent the fact that every UTF8Sequence is also viewable as a sequence of characters:

protocol CharacterSequence : Sequence where Element = Character {}

protocol UTF8Sequence : CharacterSequence, Sequence where Element = UTF8CodePoint {}
extension UTF8Sequence { /* default implementation of CharacterSequence, which turns UTF8 codepoints -> Characters */ }

String, however, would implement CharacterSequence natively. So String’s UTF8Sequence.CharacterSequence witness table could redirect to those more efficient implementations, which don't encode to UTF8 as a middle-man:

extension String : UTF8Sequence {

// We need to implement:
// - Sequence (where Iterator.Element is a UTF8CodePoint) - required
// - Sequence (where Iterator.Element is a Character)     — optionally, since there is a default

struct Iterator { /* UTF8-encoding iterator */ }
func makeIterator() -> Iterator { return Iterator(_core) }

// The type-checker could probably figure out which Iterator we mean here, but in cases where it can’t,
// we disambiguate by explicitly saying which conformance it belongs to.

typealias CharacterSequence.Iterator = String.CharacterSequence.Iterator
func CharacterSequence.makeIterator() -> String.CharacterSequence.Iterator { return self.CharacterSequence.makeIterator() }
}

func chant(_ thingToChant: CharacterSequence) {
	for character in thingToChant {
		print(“Give me a \(character)!”)
	}
}

chant(“hello")              // String is a CharacterSequence, so String.CharacterSequence exists
chant(myCustomUTF8Sequence) // A UTF8Sequence is a CharacterSequence, so MyCustomUTF8Sequence.CharacterSequence exists (maybe pointing to the default witnesses, maybe not)

The unfortunate thing about this is that it can be a bit verbose at the usage-site. For the common case (i.e. everything today) where protocols don’t overlap, the compiler could easily disambiguate:

protocol A { func doSomething() }
extension A { func doSomethingElse() }

struct MyStruct : A {
    func doSomething() { … }
}

let aThing = MyStruct()
aThing.doSomething()     // Compiler expands this to ‘aThing.A.doSomething()’
aThing.doSomethingElse() // Compiler expands this to ‘aThing.A.doSomethingElse()’

Since String could conform to CharacterSequence in multiple ways (natively, or via any of the UTF8/16 sequences), any algorithms we write in protocol extensions would not be unambiguous. I’m not sure how to solve that one.

- Karl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20161119/6c41e913/attachment.html>