[swift-dev] copy-on-write proposal

Dave Abrahams dabrahams at apple.com
Thu Oct 20 10:51:41 CDT 2016


We might want to leave some room in the design for a shared atomic cache reference to live in the buffer, FWIW. It would have to be mutable even when the buffer was multiply-referenced 

Sent from my moss-covered three-handled family gradunza

> On Oct 20, 2016, at 8:41 AM, Erik Eckstein <eeckstein at apple.com> wrote:
> 
> 
>>> On Oct 19, 2016, at 6:36 PM, Andrew Trick via swift-dev <swift-dev at swift.org> wrote:
>>> 
>>> 
>>> On Oct 19, 2016, at 10:13 AM, Dave Abrahams via swift-dev <swift-dev at swift.org> wrote:
>>> 
>>> 
>>> on Tue Oct 18 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote:
>>> 
>>>>> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <dabrahams at apple.com> wrote:
>>>>> 
>>>>> 
>>>>> on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com> wrote:
>>>>> 
>>>> 
>>>>>>> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev <swift-dev at swift.org> wrote:
>>>>>>> 
>>>>>>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org <http://swift-dev-at-swift.org/>> wrote:
>>>>>>> 
>>>>>>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev <swift-dev at swift.org> wrote:
>>>>>>>>> 
>>>>>>>>> This is a proposal for representing copy-on-write buffers in
>>>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily
>>>>>>>>> depends on how we move forward with SIL ownership.
>>>>>>>>> <CopyOnWrite.rst>
>>>>>>>>> If you have any comments, please let me know.
>>>>>>>> 
>>>>>>>> The SIL-level design seems sensible to me at a glance. At the language
>>>>>>>> level, I think it would make more sense to treat this as an attribute
>>>>>>>> on class types rather than on properties in structs using the class. I
>>>>>>>> don't think many people reuse class definitions as both shared
>>>>>>>> reference types and as value type payloads, 
>>>>>>> 
>>>>>>> Foundation does, or would if they could.
>>>>>>> 
>>>>>>>> but beyond that, I think that making it an attribute of classes would
>>>>>>>> put us into a better position to leverage the borrow model to enforce
>>>>>>>> the "mutable-only-when-unique" aspect of COW implementations. John
>>>>>>>> alluded to this in the "SIL address types and borrowing" thread:
>>>>>>>> 
>>>>>>>>> I wonder if it would make more sense to make copy-on-write buffer
>>>>>>>>> references a move-only type, so that as long as you were just
>>>>>>>>> working with the raw reference (as opposed to the CoW aggregate,
>>>>>>>>> which would remain copyable) it wouldn't get implicitly copied
>>>>>>>>> anymore.  You could have mutable and immutable buffer reference
>>>>>>>>> types, both move-only, and there could be a consuming checkUnique
>>>>>>>>> operation on the immutable one that, I dunno, returned an Either of
>>>>>>>>> the mutable and immutable versions.
>>>>>>>>> 
>>>>>>>>> For CoW aggregates, you'd need some @copied attribute on the field
>>>>>>>>> to make sure that the CoW attribute was still copyable.  Within the
>>>>>>>>> implementation of the type, though, you would be projecting out the
>>>>>>>>> reference immediately, and thereafter you'd be certain that you were
>>>>>>>>> borrowing / moving it around as appropriate.
>>>>>>>> 
>>>>>>>> If 'copy-on-write' were a trait on classes, then we could distinguish
>>>>>>>> unique and nonunique references to the class. A unique reference would
>>>>>>>> act like a move-only type to prevent accidental loss of uniqueness. 
>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>>> We can also allow a copy-on-write class to have "mutating" methods,
>>>>>>>> and only allow mutation on unique references. It seems to me like,
>>>>>>>> exploring this direction, we could also come up with a way for the
>>>>>>>> high-level value-semantics operations on the struct to statically
>>>>>>>> indicate which methods are known to leave the value's buffers in a
>>>>>>>> unique state, or which return values that are uniquely owned, which
>>>>>>>> would give the optimizer more ability to avoid uniqueness checks
>>>>>>>> across calls without relying on inlining and IPO.
>>>>>>> 
>>>>>>> That's pretty cool.  However, I think there's nothing to prevent any
>>>>>>> mutating method from storing a copy of self in a global, so I think we'd
>>>>>>> need some participation from the programmer (either an agreement not to
>>>>>>> do that, or an explicit claim of uniqueness on exit) in order to
>>>>>>> identify operations that create/preserve uniqueness.
>>>>>> 
>>>>>> If a mutating reference (like self in a mutating method) is move-only
>>>>>> then you would not be able to “copy” it to a global.
>>>>> 
>>>>> Yes, a reference to a move-only type would work for this purpose.
>>>>> 
>>>>> 
>>>>>>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev <swift-dev at swift.org> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote:
>>>>>>> 
>>>>>>>> This is a proposal for representing copy-on-write buffers in
>>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily
>>>>>>>> depends on how we move forward with SIL ownership.
>>>>>>>> 
>>>>>>>> :orphan:
>>>>>>>> 
>>>>>>>> .. highlight:: sil
>>>>>>>> 
>>>>>>>> ===================================
>>>>>>>> Copy-On-Write Representation in SIL
>>>>>>>> ===================================
>>>>>>>> 
>>>>>>>> .. contents::
>>>>>>>> 
>>>>>>>> Overview
>>>>>>>> ========
>>>>>>>> 
>>>>>>>> This document proposes:
>>>>>>>> 
>>>>>>>> - An ownership attribute to define copy-on-write (COW) buffers in Swift data
>>>>>>>> types.
>>>>>>>> 
>>>>>>>> - A representation of COW buffers in SIL so that optimizations can take benefit
>>>>>>>> of it.
>>>>>>>> 
>>>>>>>> The basic idea is to enable the SIL optimizer to reason about COW data types
>>>>>>>> in the same way as a programmer can do.
>>>>>>>> This means: a COW buffer can only be modified by its owning SIL value, because
>>>>>>>> either it's uniquely referenced or the buffer is copied before modified.
>>>>>>>> 
>>>>>>>> .. note::
>>>>>>>>  In the following the term "buffer" refers to a Swift heap object.
>>>>>>>>  It can be any heap object, not necessarily a “buffer” with e.g. tail-allocated elements.
>>>>>>>> 
>>>>>>>> COW Types
>>>>>>>> =========
>>>>>>>> 
>>>>>>>> The basic structure of COW data types can be simplified as follows::
>>>>>>>> 
>>>>>>>>  class COWBuffer {
>>>>>>>>    var someData: Int
>>>>>>>>    ...
>>>>>>>>  }
>>>>>>>> 
>>>>>>>>  struct COWType {
>>>>>>>>    var b : COWBuffer
>>>>>>>> 
>>>>>>>>    mutating func change_it() {
>>>>>>>>      if (!isUniquelyReferenced(b)) {
>>>>>>>>        b = copy_buffer(b)
>>>>>>>>      }
>>>>>>>>      b.someData = ...
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> Currently the COW behavior of such types is just defined by their implementation.
>>>>>>>> But there is no representation of this special behavior in the SIL.
>>>>>>>> So the SIL optimizer has no clue about it and cannot take advantage of it.
>>>>>>>> 
>>>>>>>> For example::
>>>>>>>> 
>>>>>>>>  func foo(arr : [Int]) {
>>>>>>>>    x = arr[0]
>>>>>>>>    opaque_function()
>>>>>>>>    y = arr[0] // can RLE replace this with y = x?
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> If opaque_function() wants to change the contents of the array buffer it first
>>>>>>>> has to copy it. 
>>>>>>> 
>>>>>>> ...or determine that it's uniquely-referenced.
>>>>>> 
>>>>>> In this specific example, if opqaue_function holds a reference to arr’s buffer, the buffer is not
>>>>>> uniquely-referenced.
>>>>> 
>>>>> Right.
>>>>> 
>>>>>>> 
>>>>>>>> But the optimizer does not know it so it has to conservatively assume
>>>>>>>> that opaque_function() will write to the location of arr[0].
>>>>>>>> 
>>>>>>>> Copy-on-write Ownership Attribute
>>>>>>>> =================================
>>>>>>>> 
>>>>>>>> This section proposes an ownership attribute to define a copy-on-write buffer.
>>>>>>>> 
>>>>>>>> Swift Syntax
>>>>>>>> ------------
>>>>>>>> 
>>>>>>>> A COW buffer reference can be defined with a new ownership attribute for the
>>>>>>>> buffer variable declaration (similar to “weak” and “unowned”)::
>>>>>>>> 
>>>>>>>>  struct COWType {
>>>>>>>>    copy_on_write var b : COWBuffer
>>>>>>>> 
>>>>>>>>    // ...
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> The ``copy_on_write`` attribute is purely used for optimization purposes.
>>>>>>>> It does not change the semantics of the program.
>>>>>>> 
>>>>>>> Presumably, it changes what code you can execute on `b` without invoking
>>>>>>> traps or undefined behavior.  Otherwise, the optimizer wouldn't be able
>>>>>>> to do anything differently to take advantage of the annotation.
>>>>>> 
>>>>>> That’s true.
>>>>>> 
>>>>>>> What are the rules for writing code that uses `copy_on_write`?
>>>>>> 
>>>>>> See below ("The rules for using ``copy_on_write`` and the built-ins are:”)
>>>>> 
>>>>> Yeah, I got there, eventually.  But just saying “doesn't change
>>>>> semantics” at this point in the proposal leaves a gap, because it does
>>>>> change semantic *requirements*.  You should mention that.
>>>>> 
>>>>>>>> .. note::
>>>>>>>> 
>>>>>>>> “copy_on_write” is a  working title. TODO: decide on the name.
>>>>>>>> Maybe it should be a @-attribute, like @copy_on_write?
>>>>>>>> Another question is if we should open this attribute for the public or just
>>>>>>>> use it internally in the library, because violating the implied rules
>>>>>>>> (see below) could break memory safety.
>>>>>>>> 
>>>>>>>> Implementation
>>>>>>>> --------------
>>>>>>>> 
>>>>>>>> The ``copy_on_write`` references can be represented in the AST as a special
>>>>>>>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented.
>>>>>>>> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced
>>>>>>>> buffer class type.
>>>>>>>> 
>>>>>>>> In SIL the buffer reference will have type::
>>>>>>>> 
>>>>>>>>  $@sil_cow COWBuffer
>>>>>>>> 
>>>>>>>> where ``COWBuffer`` is the type of the referenced heap object.
>>>>>>>> 
>>>>>>>> Two conversion instructions are needed to convert from a ``@sil_cow`` reference
>>>>>>>> type to a regular reference type::
>>>>>>>> 
>>>>>>>>  cow_to_ref
>>>>>>>>  ref_to_cow
>>>>>>>> 
>>>>>>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``.
>>>>>>>> 
>>>>>>>> For example the SIL code for::
>>>>>>>> 
>>>>>>>>  var c: COWType
>>>>>>>>  let x = c.b.someData
>>>>>>>> 
>>>>>>>> would be::
>>>>>>>> 
>>>>>>>>  %1 = struct_extract %1 : COWType, #COWType.b
>>>>>>>>  %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>>>  %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData
>>>>>>>>  %4 = load %3 : $*Int
>>>>>>>> 
>>>>>>>> The ``ref_to_cow`` instruction is needed to store a new buffer reference into a
>>>>>>>> COW type.
>>>>>>>> 
>>>>>>>> COW Buffers and the Optimizer
>>>>>>>> =============================
>>>>>>>> 
>>>>>>>> A reference to a COW buffer gives the optimizer additional information:
>>>>>>>> 
>>>>>>>> *A buffer, referenced by a @sil_cow reference is considered to be immutable
>>>>>>>> during the lifetime of the reference.*
>>>>>>> 
>>>>>>> This seems like much too broad a rule to allow inplace mutations of
>>>>>>> uniquely referenced buffers.
>>>>>> 
>>>>>> The point is that all mutations must be guarded by an is_unique, which
>>>>>> takes the _address_ of the buffer reference as argument.
>>>>>> And the optimizer considers this instruction as a potential write to the buffer reference.
>>>>>> The effect is that the lifetime of a buffer reference (as a SIL value)
>>>>>> will not outlive a is_unique - regardless if this is inside a called
>>>>>> function or inlined.
>>>>> 
>>>>> I don't see how that allows me to mutate a uniquely referenced buffer held
>>>>> in a @sil_cow reference, given what you wrote above.
>>>> 
>>>> You would not be able to get a reference to a mutable buffer by
>>>> reading the COW type’s @sil_cow field.  Instead you would only get
>>>> such a reference as a result of the is_unique instruction/builtin. Or,
>>>> of course, by creating a new buffer.
>>>> 
>>>> I’m not sure if this was the question, though.
>>> 
>>> I think it just comes down to precise phrasing.  AFAICT, what you really
>>> mean to say is something like
>>> 
>>>  A buffer cannot be directly mutated through a @sil_cow reference;
>>>  instead one must mutate it indirectly via the result of is_unique or
>>>  start_unique.
> 
> Exactly, that’s what I wanted to say.
> 
>>> 
>>> Saying that the buffer is “considered to be immmutable during the
>>> lifetime of the reference” could be taken to mean that the compiler will
>>> assume no mutations of the buffer can occur while the reference exists.
>>> IIUC you are not planning to formally end the reference's lifetime at
>>> the moment is_unique/start_unique returns.
>> 
>> To clarify: I proposed an alternate approach in which the @sil_cow reference is only mutable during the Array’s @inout scope—to be automatically enforced by the compiler once @inout scopes are enforced. But the text in question is not referring to that approach, so your comments are on target.
> 
> After thinking about Joe’s suggestion (having the cow attribute on the class type and make a reference to that type move-only), I’m more inclined to go with the isUnique builtin. If such a reference can only be returned by isUnique, it is really guaranteed that only a uniquely referenced buffer can be mutated. With the inout approach, the programmer is not forced to make the uniqueness check before modifying the buffer.
> 
>> -Andy 
>> 
>>>> 
>>>> Plus: we will have an explicit conversion instruction (start_unique) to convert an immutable
>>>> reference to a mutable referece.
>>>> A SIL optimization can replace an is_unique with this instruction if it can prove that the reference
>>>> is already unique at that point.
>>>> 
>>>>> 
>>>>> 
>>>>>>> Unless you mean the reference is
>>>>>>> immutable, rather than the storage being referred to by it.
>>>>>>> 
>>>>>>>> This means any address derived from a ``cow_to_ref`` instruction can be
>>>>>>>> considered to point to immutable memory.
>>>>>>>> 
>>>>>>>> Some examples of optimizations which will benefit from copy-on-write
>>>>>>>> representation in SIL:
>>>>>>>> 
>>>>>>>> - Redundant load elimination
>>>>>>>> 
>>>>>>>> RLE can assume that opaque code does not modify a COW buffer.
>>>>>>> 
>>>>>>> How do you distinguish “opaque code” from “code that is meant to
>>>>>>> modify the buffer and might do so in place if it's uniquely-referenced?”
>>>>>> 
>>>>>> Again, the is_unique which takes the address of the reference, will
>>>>>> guarantee that during the lifetime of a buffer there are no
>>>>>> modifications of the buffer.
>>>>> 
>>>>> Again, that sounds like it rules out inplace modification of uniquely
>>>>> referenced buffers.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> Example::
>>>>>>>> 
>>>>>>>>    %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>>>    %3 = ref_element_addr %2 : $COWBuffer, #someData
>>>>>>>>    %4 = load %3 : $*Int
>>>>>>>>    %5 = apply %foo()                        // Cannot overwrite memory location %3
>>>>>>>>    %6 = load %3 : $*Int                     // Can be replaced by %4
>>>>>>>> 
>>>>>>>> Currently we do some ad-hoc optimizations for array, based on semantics,
>>>>>>>> like array count propagation. These hacks would not be needed
>>>>>>>> anymore.
>>>>>>> 
>>>>>>> W0000000000000000000000t.
>>>>>>> 
>>>>>>>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a
>>>>>>>> projected address) escapes. Even if it escapes, it will reference immutable
>>>>>>>> memory.
>>>>>>>> 
>>>>>>>> - CSE, loop hoisting
>>>>>>>> 
>>>>>>>> Similar to RLE: the optimizer can assume that opaque code cannot modify a
>>>>>>>> COW buffer
>>>>>>> 
>>>>>>> Same question here as above, then.
>>>>>>>> 
>>>>>>>> - ARC optimization
>>>>>>>> 
>>>>>>>> Knowing that some opaque code cannot overwrite a reference in the COW buffer
>>>>>>>> can remove retain/release pairs across such code::
>>>>>>>> 
>>>>>>>>    %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>>>    %3 = ref_element_addr %2 : $COWBuffer, #someRef
>>>>>>>>    %4 = load_strong %3 : $*MyClass          // Can do a load_strong [guarantee]
>>>>>>>>    %5 = apply %foo()                        // Cannot overwrite someRef and dealloc the object
>>>>>>>>    // Use %4
>>>>>>>>    destroy_value %4 : $MyClass
>>>>>>>> 
>>>>>>>> Scoping instructions
>>>>>>>> --------------------
>>>>>>>> 
>>>>>>>> To let the optimizer reason about the immutability of the COW buffer, it is
>>>>>>>> important to *bind* the lifetime of the buffer content to the lifetime of the
>>>>>>>> buffer reference. For example::
>>>>>>>> 
>>>>>>>>  %b1 = load %baddr : $@sil_cow COWBuffer  // load the buffer reference
>>>>>>>>  // load something from %b1
>>>>>>>>  %a = apply %foo(%baddr : $@sil_cow COWBuffer)
>>>>>>>>  %b2 = load %baddr : $@sil_cow COWBuffer  // load the buffer reference again
>>>>>>>>  // load something from %b2
>>>>>>>> 
>>>>>>>> The question is: can RLE forward the load of the buffer reference and replace
>>>>>>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the
>>>>>>>> buffer.
>>>>>>>> 
>>>>>>>> To enforce this restriction, the scope of any buffer modification must be
>>>>>>>> enclosed in a pair of SIL instructions. Those instructions define the scope
>>>>>>>> of the mutation. Both instructions take the *address* of the buffer
>>>>>>>> reference as operand and act as a potential write to the buffer reference. 
>>>>>>>> 
>>>>>>>> The purpose of the scoping instructions is to strictly separate the liferanges
>>>>>>>> of references to an immutable buffer and references to the mutable buffer.
>>>>>>> 
>>>>>>> Looks reasonable.
>>>>>>> 
>>>>>>>> The following example shows why the scoping instructions (specifically the
>>>>>>>> end-of-scope instruction) are required to prevent loop-hoisting from
>>>>>>>> interleaving mutable and immutable liferanges::
>>>>>>>> 
>>>>>>>>  // there should be a begin-of-scope %baddr
>>>>>>>>  %mut_b = load %baddr
>>>>>>>>  store %x to %mut_b    // modification of the buffer
>>>>>>>>  // there should be a end-of-scope %baddr
>>>>>>>> 
>>>>>>>>  loop {
>>>>>>>>    %b = load %baddr
>>>>>>>>    %y = load %b        // load from the buffer
>>>>>>>>    ...
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> If there is no end-of-scope instruction, loop hoisting could do::
>>>>>>>> 
>>>>>>>>  %mut_b = load %baddr
>>>>>>>>  %b = load %baddr        // moved out of the loop
>>>>>>>>  store %x to %mut_b
>>>>>>>> 
>>>>>>>>  loop {
>>>>>>>>    %y = load %b
>>>>>>>>    ...
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> Now the optimizer assumes that ``%b`` references an immutable buffer, so it could
>>>>>>>> also hoist the load::
>>>>>>>> 
>>>>>>>>  %mut_b = load %baddr
>>>>>>>>  %b = load %baddr
>>>>>>>>  %y = load %b          // Wrong! Will be overwritten by the following store
>>>>>>>>  store %x to %mut_b
>>>>>>>> 
>>>>>>>>  loop {
>>>>>>>>    ...
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The following sections describe two alternatives to implement the scoping.
>>>>>>>> 
>>>>>>>> Scoping Alternative 1: Explicit Built-ins
>>>>>>>> -----------------------------------------
>>>>>>>> 
>>>>>>>> SIL instructions
>>>>>>>> ^^^^^^^^^^^^^^^^
>>>>>>>> 
>>>>>>>> The existing ``is_unique`` instruction is changed to a terminator instruction::
>>>>>>>> 
>>>>>>>>  bb0:
>>>>>>>>    is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2  // %0 is the address of the COWBuffer reference
>>>>>>>>  bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique reference. Physically identical to "load %0”
>>>>>>>>    // usually empty
>>>>>>>>    br bb3(%1 : $COWBuffer)
>>>>>>>>  bb2:                  // the false-block
>>>>>>>>    // usually contains:
>>>>>>>>    %2 = apply %copy_buffer
>>>>>>>>    %3 = cow_to_ref %2
>>>>>>>>    store_strong %3 to %0 : $*@sil_cow COWBuffer
>>>>>>>>    br bb3(%2 : $COWBuffer)
>>>>>>>>  bb3(%4 : $COWBuffer):
>>>>>>>>    // Modify the buffer referenced by %4
>>>>>>>>    // ...
>>>>>>>> 
>>>>>>>> The end-of-scope instruction is::
>>>>>>>> 
>>>>>>>>  end_unique_addr %0 : $*COWBuffer
>>>>>>>> 
>>>>>>>> It is important that the references to the unique buffers (``%1``, ``%2``) must
>>>>>>>> not outlive ``end_unique_addr``. In most cases this can be check by the SIL
>>>>>>>> verifier.
>>>>>>>> 
>>>>>>>> The two instructions must be paired properly but not necessarily in the
>>>>>>>> same function.
>>>>>>>> 
>>>>>>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to
>>>>>>>> separate the lifetimes of mutable and immutable accesses to the COW buffer.
>>>>>>>> Both instructions take an address to the COW buffer reference and are
>>>>>>>> considered as potential stores to the reference.
>>>>>>>> This makes sure that the SIL optimizer cannot mix-up buffer reference lifetimes
>>>>>>>> across these instructions.
>>>>>>>> For example, RLE cannot combine two buffer loads which are interleaved with
>>>>>>>> a ``is_unique_addr_br``::
>>>>>>>> 
>>>>>>>>  %1 = load_strong %0 : $*@sil_cow COWBuffer
>>>>>>>>  // do something with %1
>>>>>>>>>>>>>>>>  is_unique_addr_br %0 : $*@sil_cow COWBuffer
>>>>>>>>>>>>>>>>  %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this with %1
>>>>>>>> 
>>>>>>>> Another important thing is that the COW buffer can only be mutated by using the
>>>>>>>> reference of the ``is_unique_addr_br`` true-block argument.
>>>>>>>> The COW buffer cannot be modified by simply loading/extracting the reference
>>>>>>>> from the COWType.
>>>>>>>> Example::
>>>>>>>> 
>>>>>>>> %1 = load_strong %0 : $*COWBuffer
>>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData
>>>>>>>> store %7 : $Int to %3 : $*Int            // Violation!
>>>>>>>> 
>>>>>>>> Most obvious violations to this constraint can be catched by the SILVerifier.
>>>>>>>> 
>>>>>>>> The ``_addr`` variants of the instructions also have a non-addr counterpart::
>>>>>>>> 
>>>>>>>>  is_unique_br %0 : $COWBuffer, bb1, bb2.  // consumes %0 and produces the true-block arg as owned
>>>>>>>> 
>>>>>>>>  %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned
>>>>>>>> 
>>>>>>>> These instructions are generated by Mem2reg (or a similar optimization)
>>>>>>>> in case the COW value is stored (in a temporary alloc_stack location)
>>>>>>>> just for the sake of passing an address to ``is_unique_addr_br`` and
>>>>>>>> ``end_unique_addr``.
>>>>>>>> For example in the following code, where the COW data is passed as-value and
>>>>>>>> all the mutating functions are inlined::
>>>>>>>> 
>>>>>>>>  func foo(arr : [Int], x: Int) {
>>>>>>>>    arr[0] = 27
>>>>>>>>>>>>>>>>    y = arr[x]
>>>>>>>>>>>>>>>>  }
>>>>>>>> 
>>>>>>>> Finally it’s probably a good idea to add an instruction for converting an
>>>>>>>> immutable reference to a mutable reference::
>>>>>>>> 
>>>>>>>>  %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : $COWBuffer as owned
>>>>>>>> 
>>>>>>>> which is basically just a simpler representation of the following pattern::
>>>>>>>> 
>>>>>>>>  bb0:
>>>>>>>>    is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2
>>>>>>>>  bb1(%1 : $COWBuffer):
>>>>>>>>    … // main control flow continues here
>>>>>>>>  bb2:
>>>>>>>>    unreachable
>>>>>>>> 
>>>>>>>> An optimizations, which eliminate uniqueness checks, would replace a
>>>>>>>> ``is_unique_br`` by a ``start_unique``.
>>>>>>>> 
>>>>>>>> Built-ins
>>>>>>>> ^^^^^^^^^
>>>>>>>> 
>>>>>>>> A COW type implementor can generate the new instructions by using a set of built-ins::
>>>>>>>> 
>>>>>>>>  func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType?
>>>>>>>>  func endUnique<BufferType>(_ buffer: inout BufferType)  
>>>>>>>> 
>>>>>>>> For example::
>>>>>>>> 
>>>>>>>>  struct COWType {
>>>>>>>>    copy_on_write var b : COWBuffer
>>>>>>>> 
>>>>>>>>    mutating func makeMutable() -> COWBuffer {
>>>>>>>>      if let uniqueBuffer = isUnique(&self.b) {
>>>>>>>>        return uniqueBuffer
>>>>>>>>      }
>>>>>>>>      let copiedBuffer = copyBuffer(self.b)
>>>>>>>>      self.b = copiedBuffer
>>>>>>>>      return copiedBuffer
>>>>>>>>    }
>>>>>>>> 
>>>>>>>>    mutating func setSomeData(x: Int) {
>>>>>>>>      let uniqueBuffer = makeMutable()
>>>>>>>>      uniqueBuffer.someData = x
>>>>>>>>      endUnique(&self.b)
>>>>>>>>    }
>>>>>>>>  }
>>>>>>> 
>>>>>>> This seems reasonable, but it also looks like the compiler could do the
>>>>>>> `endUnique` dance for us based, e.g., on the mutability of methods.  
>>>>>> 
>>>>>> I agree, that would be ideal, e.g. the compiler could insert the endUnique at the end of an inout
>>>>>> scope.
>>>>>> 
>>>>>>> 
>>>>>>>> The ``isUnique`` built-in returns an optional unique buffer reference.
>>>>>>>> Physically this is the COW buffer which is passed as the inout argument.
>>>>>>>> The result is nil if the buffer is not uniquely referenced.
>>>>>>>> In this case usually the original buffer is copied and the reference to the
>>>>>>>> copy is written back to the original buffer reference location
>>>>>>>> (``self.b = copiedBuffer``).
>>>>>>>> Starting at the point of the write-back, the reference to the copy also becomes
>>>>>>>> a unique buffer reference.
>>>>>>>> 
>>>>>>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern which
>>>>>>>> constructs the Optional in the successor blocks. Using ``isUnique`` in an
>>>>>>>> if-let (as shown above) will end up in two consecutive CFG "diamonds".
>>>>>>>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond.
>>>>>>>> 
>>>>>>>> .. note::
>>>>>>>> This makes the definition of the unique buffer location lifetime a little bit
>>>>>>>> problematic, because the false-branch of ``isUnique`` is not equivalent to
>>>>>>>> the false-branch of the ``is_unique_addr_br`` instruction (before SimplifyCFG
>>>>>>>> can do its job).
>>>>>>> 
>>>>>>> I don't know what the implications of these diamonds and the problem
>>>>>>> described above might be, FWIW.
>>>>>>> 
>>>>>>>> The rules for using ``copy_on_write`` and the built-ins are:
>>>>>>>> 
>>>>>>>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in the
>>>>>>>> same function.
>>>>>>>> 
>>>>>>>> 2. The COW buffer may only be mutated by using the unique buffer reference.
>>>>>>>> 
>>>>>>>> 3. The COW buffer must not be mutated outside the ``isUnique`` - ``endUnique``
>>>>>>>> pair.
>>>>>>>> 
>>>>>>>> 4. During the lifetime of the unique buffer reference, the original COW buffer
>>>>>>>> reference must not be used in any way, e.g. for reading from the buffer.
>>>>>>>> 
>>>>>>>> Note that the lifetime of the unique buffer reference does not include the
>>>>>>>> part between the begin of the ``isUnique`` false-branch and the write-back
>>>>>>>> of the copy. This means is okay to read from the buffer (using ``self.b``)
>>>>>>>> for the purpose of copying.
>>>>>>>> 
>>>>>>>> Examples::
>>>>>>>> 
>>>>>>>>  mutating func setSomeData(x: Int) {
>>>>>>>>    let uniqueBuffer = makeMutable()
>>>>>>>>    uniqueBuffer.someData = x
>>>>>>>>    // violates rule 1
>>>>>>>>  }
>>>>>>>> 
>>>>>>>>  mutating func setSomeData(x: Int) {
>>>>>>>>    makeMutable()
>>>>>>>>    self.b.someData = x // violates rule 2
>>>>>>>>    endUnique(&self.b)
>>>>>>>>  }
>>>>>>>> 
>>>>>>>>  mutating func setSomeData(x: Int) {
>>>>>>>>    let uniqueBuffer = makeMutable()
>>>>>>>>    uniqueBuffer.someData = x
>>>>>>>>    endUnique(&self.b)
>>>>>>>>    uniqueBuffer.someData = 27 // violates rule 3
>>>>>>>>  }
>>>>>>>> 
>>>>>>>>  mutating func incrementSomeData() {
>>>>>>>>    let uniqueBuffer = makeMutable()
>>>>>>>>    uniqueBuffer.someData = self.b.someData + 1 // violates rule 4
>>>>>>>>    endUnique(&self.b)
>>>>>>>>  }
>>>>>>> 
>>>>>>> It would be instructive to write down the *correct* code for these
>>>>>>> operations.
>>>>>> 
>>>>>> added to my todo list.
>>>>>> 
>>>>>>> 
>>>>>>>> The intention of the rules is to ensure that there is no overlap of a
>>>>>>>> "read-only" life-range with a "mutable" life-range of the buffer reference.
>>>>>>>> It’s the responsibility of the implementor to follow the rules.
>>>>>>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can
>>>>>>>> statically detect rule violations in obvious cases (with inter-procedural
>>>>>>>> analysis maybe even in most cases).
>>>>>>>> 
>>>>>>>> This approach would require to change some of the internals of our
>>>>>>>> current COW data structures in the stdlib (Array, Dictionary, etc.).
>>>>>>>> For example, the Array make_mutable semantic functions currently do not return
>>>>>>>> the unique buffer.
>>>>>>> 
>>>>>>> No big deal.
>>>>>>> 
>>>>>>>> Scoping Alternative 2: Implicit Inout Scopes
>>>>>>>> --------------------------------------------
>>>>>>>> 
>>>>>>>> There is an idea (proposal?) to change the representation of inout variables
>>>>>>>> in SIL. This is independent of this proposal, but can be helpful for the
>>>>>>>> purpose of defining the scope of a COW mutation.
>>>>>>>> 
>>>>>>>> The basic idea is that SILGen inserts scoping instructions for *all* inout
>>>>>>>> variables. And those scoping instructions can be used to define the mutating
>>>>>>>> scope of a COW buffer.
>>>>>>>> 
>>>>>>>> The scoping instructions which are inserted by SILGen for an inout scope are::
>>>>>>>> 
>>>>>>>>  begin_exclusive
>>>>>>>>  end_exclusive
>>>>>>>> 
>>>>>>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those instructions take the
>>>>>>>> address of the inout variable as argument. For the optimizer those instructions
>>>>>>>> look like potential writes to the inout variable.
>>>>>>>> 
>>>>>>>> The implementor of a COW type has to follow the rule that the COW buffer may
>>>>>>>> only be modified in mutating functions of the COW type. But this is the case
>>>>>>>> anyway because any modification needs a uniqueness check and this can only be
>>>>>>>> done in mutating functions.
>>>>>>>> 
>>>>>>>> Example::
>>>>>>>> 
>>>>>>>>  // > mutating func setSomeData(x: Int) {
>>>>>>>>  // Accepts a unique reference to the array value (avoiding refcount operations)
>>>>>>>>  sil @setSomeData : $(Int, @inout Array) -> () {
>>>>>>>>  bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0
>>>>>>>> 
>>>>>>>>  // >   makeMutable() (inlined)
>>>>>>>>  // Forward the unique reference to the `self` array value, still avoiding refcount operations.
>>>>>>>>  // Begin the inlined exclusive scope (could be trivially removed).
>>>>>>>>  begin_exclusive %arrayref : $*Array<T> // Begin scope #1
>>>>>>>> 
>>>>>>>>  // >    if !isUnique(&self._storage) {
>>>>>>>>  // Extract a unique inout reference to the class reference to the array storage.
>>>>>>>>  // This begins the isUnique() argument's exclusive scope. The memory is already exclusive
>>>>>>>>  // but the scope helps ensure this is the only alias to _storage.
>>>>>>>>  %arrayref._storageref = struct_element_addr [exclusive] %arrayref, #Array._storage
>>>>>>>> 
>>>>>>>>  // Uniqueness checking requires an inout reference to the class reference.
>>>>>>>>  // The is_unique instruction does not need to create a new storage reference.
>>>>>>>>  // It's only purpose is to check the RC count, ensure that the checked reference
>>>>>>>>  // is inout, and prevent the inout scope from being optimized away.
>>>>>>>>  %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>>>>>>>> 
>>>>>>>>  // End the isUnique argument's exclusive scope (can also be trivially removed).
>>>>>>>>  end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>>>>>>>> 
>>>>>>>>  br %isuniq, bb_continue, bb_slow
>>>>>>>> 
>>>>>>>>  bb_slow:
>>>>>>>>  // >      self._storage = copyBuffer(self._storage)
>>>>>>>>  // Produce a new class reference to storage with verifiably unique RC semantics.
>>>>>>>>  %copied_storage_class = alloc_ref ...
>>>>>>>>  // A begin/end exclusive scope is implicit in store [assign].
>>>>>>>>  store [assign] %copied_storage_class to %arrayref._storageref
>>>>>>>>  br bb_continue
>>>>>>>> 
>>>>>>>>  bb_continue:
>>>>>>>> 
>>>>>>>>  // This marks the end of makeMutable's inout `self` scope. Because Array
>>>>>>>>  // contains a "copy_on_write" property, the SIL verifier needs to
>>>>>>>>  // prove that %arrayref.#_storage has not escaped at this point. This
>>>>>>>>  // is equivalent to checking that %arrayref itself is not copied, and
>>>>>>>>  // checking each projection of the "copy_on_write" storage property
>>>>>>>>  // (%arrayref._storageref) is not copied. Or, if any copies are present,
>>>>>>>>  // they must be consumed within this scope.
>>>>>>>>  end_exclusive %arrayref : $*Array<T> // End scope #1
>>>>>>>> 
>>>>>>>>  // >    self._storage.someData = x
>>>>>>>>  // An _addr instruction with one load/store use doesn't really need its own scope.
>>>>>>>>  %arrayref._storageref = struct_element_addr %arrayref, #Array._storage
>>>>>>>> 
>>>>>>>>  // ARC optimization can promote this to a borrow, replacing strong_release with end_borrow.
>>>>>>>>  %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow ArrayStorage
>>>>>>>>  %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow ArrayStorage
>>>>>>>> 
>>>>>>>>  // Write some data into the CoW buffer.
>>>>>>>>  // (For simplicity, pretend ArrayStorage has a "someData" field).
>>>>>>>>  // A single-use _addr instruction, so no scope.
>>>>>>>>  %somedata_addr = ref_element_addr %arrayref._storage, #someData
>>>>>>>>  // A store with an implicit [exclusive] scope.
>>>>>>>>  store [assign] %x to %somedata_addr
>>>>>>>> 
>>>>>>>>  strong_release %arrayref._storage : $*ArrayStorage<T>
>>>>>>>> 
>>>>>>>>  // End the isUnique argument's exclusive scope.
>>>>>>>>  // The same verification is needed here, but the inner scope would be eliminated.
>>>>>>>>  end_exclusive %arrayref : $*Array<T> // End scope #0
>>>>>>>> 
>>>>>>>> In general this approach looks more "user-friendly" than the first
>>>>>>>> alternative.
>>>>>>> 
>>>>>>> Well, I can't really tell, because you haven't shown the Swift code that
>>>>>>> generates this SIL.
>>>>>>> 
>>>>>>>> But it depends on implementing the general feature to insert the inout
>>>>>>>> scoping instructions.  Also, we still have to think through all the
>>>>>>>> details of this approach.
>>>>>>> 
>>>>>>> FWIW, I am convinced we will need (and get) a stricter inout model that
>>>>>>> would be conducive to inserting the scoping instructions.
>>>>>>> 
>>>>>>> 
>>>>>>>> Dependency between a buffer reference to the scope-begin
>>>>>>>> --------------------------------------------------------
>>>>>>> 
>>>>>>> You can only have a dependency between two things, but as phrased “a
>>>>>>> buffer reference to the scope-begin” sounds like one thing.  s/to/and/
>>>>>>> would fix it.
>>>>>>> 
>>>>>>>> With both alternatives there is no explicit dependency from a buffer reference
>>>>>>>> to a scope-begin instruction::
>>>>>>>> 
>>>>>>>>  %b_cow = load %baddr
>>>>>>>>  %b = cow_to_ref %b_cow
>>>>>>>>  %x = load %b             // No dependency between this...
>>>>>>>>  ...
>>>>>>>>  begin_exclusive %baddr   // ... and this instruction.
>>>>>>>>  ...
>>>>>>>> 
>>>>>>>> So in theory the optimizer is free to reschedule the instructions::
>>>>>>>> 
>>>>>>>>  %b_cow = load %baddr
>>>>>>>>  %b = cow_to_ref %b_cow
>>>>>>>>  ...
>>>>>>>>  begin_exclusive %baddr
>>>>>>>>  %x = load %b             // Wrong! Buffer could be modified here
>>>>>>>>  ...
>>>>>>>> 
>>>>>>>> We still have to figure out how to cope with this.
>>>>>>>> 
>>>>>>>> - We could add an end-of-lifetime instruction for a COW buffer reference, which
>>>>>>>> the optimizer may not move over a begin-of-scope instruction.
>>>>>>>> 
>>>>>>>> - Or we just define the implicit rule for the optimizer that any use of a COW
>>>>>>>> reference may not be moved over a begin-of-scope instruction.
>>>>>>>> 
>>>>>>>> Preconditions
>>>>>>>> =============
>>>>>>>> 
>>>>>>>> To benefit from COW optimizations in the stdlib Array, Set and Dictionary data
>>>>>>>> structures we first need eager bridging, meaning getting rid of the bridged
>>>>>>>> buffer. 
>>>>>>> 
>>>>>>> As you know I'm very much in favor of eager bridging, but I don't see
>>>>>>> why this would be dependent on it.
>>>>>> 
>>>>>> We could use copy_on_write with eager bridging, but I don’t think it will give any benefits to the
>>>>>> optimizer.
>>>>>> For example, the SIL code to get from an Array to a native
>>>>>> ContiguousArrayStorage reference is pretty hard to understand for the
>>>>>> optimizer (involves low level bit operations, etc.).
>>>>> 
>>>>> It wouldn't need to do low-level bit operations if our enums were
>>>>> capable/controllable enough.  I'm just saying, there's no reason we
>>>>> couldn't give the optimizer something to work with that has higher level
>>>>> semantics than what we currently do.
>>>>> 
>>>>>>>> At least for Array this is implemented as low-level bit operations and
>>>>>>>> optimizations cannot reason about it (e.g. finding a reasonable
>>>>>>>> RC-root for the buffer reference).
>>>>>>>> 
>>>>>>>> Another thing is that we currently cannot use ``copy_on_write`` for Array
>>>>>>>> because of pinning. Array pins it’s buffer when passing an element address to
>>>>>>>> an inout parameter. This allows the array buffer to be modified even if its
>>>>>>>> reference count is > 1. With ``copy_on_write``, a programmer could break memory
>>>>>>>> safety when violating the inout rule. Example::
>>>>>>>> 
>>>>>>>>  var arr = [MyClass()]  // a global array
>>>>>>>> 
>>>>>>>>  foo(&arr[0])        // Pins the buffer of arr during the call
>>>>>>>> 
>>>>>>>>  func foo(_ x: inout MyClass) -> Int {
>>>>>>>>    let b = arr       // The ref-count of the buffer is not incremented, because it is pinned!
>>>>>>>>    let r = b[0]      // optimizer removes the retain of r because it thinks the following code cannot modify b
>>>>>>>>    arr.removeAll()   // does not copy the array buffer and thus de-allocates r
>>>>>>>>    return r.i        // use-after-free!
>>>>>>>>  }
>>>>>>> 
>>>>>>> I only know of one way to resolve inout and pinning:
>>>>>>> 
>>>>>>> * Semantically, references are replaced with a trap value when entering
>>>>>>> an inout context so that all inout values are provably unique
>>>>>>> references in the absence of unsafe code.  We drop pinning and provide
>>>>>>> explicit operations that provide simultaneous lvalue accesses to
>>>>>>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices.
>>>>>>> 
>>>>>>> If there are other ideas out there, I'd like to hear them.  If not, we
>>>>>>> should probably decide that this is what we're doing so that we can move
>>>>>>> forward without this looming uncertainty.
>>>>>>> 
>>>>>>> -- 
>>>>>>> -Dave
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> swift-dev mailing list
>>>>>>> swift-dev at swift.org
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> -Dave
>>>> 
>>>> _______________________________________________
>>>> swift-dev mailing list
>>>> swift-dev at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>> 
>>> -- 
>>> -Dave
>>> 
>>> _______________________________________________
>>> swift-dev mailing list
>>> swift-dev at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-dev
>> 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-dev
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161020/c23591c8/attachment-0001.html>


More information about the swift-dev mailing list