[swift-dev] copy-on-write proposal
Erik Eckstein
eeckstein at apple.com
Mon Oct 17 10:58:13 CDT 2016
On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev <swift-dev at swift.org> wrote:
> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org <http://swift-dev-at-swift.org/>> wrote:
>
>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev <swift-dev at swift.org> wrote:
>>>
>>> This is a proposal for representing copy-on-write buffers in
>>> SIL. Actually it’s still a draft for a proposal. It also heavily
>>> depends on how we move forward with SIL ownership.
>>> <CopyOnWrite.rst>
>>> If you have any comments, please let me know.
>>
>> The SIL-level design seems sensible to me at a glance. At the language
>> level, I think it would make more sense to treat this as an attribute
>> on class types rather than on properties in structs using the class. I
>> don't think many people reuse class definitions as both shared
>> reference types and as value type payloads,
>
> Foundation does, or would if they could.
>
>> but beyond that, I think that making it an attribute of classes would
>> put us into a better position to leverage the borrow model to enforce
>> the "mutable-only-when-unique" aspect of COW implementations. John
>> alluded to this in the "SIL address types and borrowing" thread:
>>
>>> I wonder if it would make more sense to make copy-on-write buffer
>>> references a move-only type, so that as long as you were just
>>> working with the raw reference (as opposed to the CoW aggregate,
>>> which would remain copyable) it wouldn't get implicitly copied
>>> anymore. You could have mutable and immutable buffer reference
>>> types, both move-only, and there could be a consuming checkUnique
>>> operation on the immutable one that, I dunno, returned an Either of
>>> the mutable and immutable versions.
>>>
>>> For CoW aggregates, you'd need some @copied attribute on the field
>>> to make sure that the CoW attribute was still copyable. Within the
>>> implementation of the type, though, you would be projecting out the
>>> reference immediately, and thereafter you'd be certain that you were
>>> borrowing / moving it around as appropriate.
>>
>> If 'copy-on-write' were a trait on classes, then we could distinguish
>> unique and nonunique references to the class. A unique reference would
>> act like a move-only type to prevent accidental loss of uniqueness.
>
> +1
>
>> We can also allow a copy-on-write class to have "mutating" methods,
>> and only allow mutation on unique references. It seems to me like,
>> exploring this direction, we could also come up with a way for the
>> high-level value-semantics operations on the struct to statically
>> indicate which methods are known to leave the value's buffers in a
>> unique state, or which return values that are uniquely owned, which
>> would give the optimizer more ability to avoid uniqueness checks
>> across calls without relying on inlining and IPO.
>
> That's pretty cool. However, I think there's nothing to prevent any
> mutating method from storing a copy of self in a global, so I think we'd
> need some participation from the programmer (either an agreement not to
> do that, or an explicit claim of uniqueness on exit) in order to
> identify operations that create/preserve uniqueness.
If a mutating reference (like self in a mutating method) is move-only then you would not be able to “copy” it to a global.
>
>
> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev <swift-dev at swift.org> wrote:
>
>
> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote:
>
>> This is a proposal for representing copy-on-write buffers in
>> SIL. Actually it’s still a draft for a proposal. It also heavily
>> depends on how we move forward with SIL ownership.
>>
>> :orphan:
>>
>> .. highlight:: sil
>>
>> ===================================
>> Copy-On-Write Representation in SIL
>> ===================================
>>
>> .. contents::
>>
>> Overview
>> ========
>>
>> This document proposes:
>>
>> - An ownership attribute to define copy-on-write (COW) buffers in Swift data
>> types.
>>
>> - A representation of COW buffers in SIL so that optimizations can take benefit
>> of it.
>>
>> The basic idea is to enable the SIL optimizer to reason about COW data types
>> in the same way as a programmer can do.
>> This means: a COW buffer can only be modified by its owning SIL value, because
>> either it's uniquely referenced or the buffer is copied before modified.
>>
>> .. note::
>> In the following the term "buffer" refers to a Swift heap object.
>> It can be any heap object, not necessarily a “buffer” with e.g. tail-allocated elements.
>>
>> COW Types
>> =========
>>
>> The basic structure of COW data types can be simplified as follows::
>>
>> class COWBuffer {
>> var someData: Int
>> ...
>> }
>>
>> struct COWType {
>> var b : COWBuffer
>>
>> mutating func change_it() {
>> if (!isUniquelyReferenced(b)) {
>> b = copy_buffer(b)
>> }
>> b.someData = ...
>> }
>> }
>>
>> Currently the COW behavior of such types is just defined by their implementation.
>> But there is no representation of this special behavior in the SIL.
>> So the SIL optimizer has no clue about it and cannot take advantage of it.
>>
>> For example::
>>
>> func foo(arr : [Int]) {
>> x = arr[0]
>> opaque_function()
>> y = arr[0] // can RLE replace this with y = x?
>> }
>>
>> If opaque_function() wants to change the contents of the array buffer it first
>> has to copy it.
>
> ...or determine that it's uniquely-referenced.
In this specific example, if opqaue_function holds a reference to arr’s buffer, the buffer is not uniquely-referenced.
>
>> But the optimizer does not know it so it has to conservatively assume
>> that opaque_function() will write to the location of arr[0].
>>
>> Copy-on-write Ownership Attribute
>> =================================
>>
>> This section proposes an ownership attribute to define a copy-on-write buffer.
>>
>> Swift Syntax
>> ------------
>>
>> A COW buffer reference can be defined with a new ownership attribute for the
>> buffer variable declaration (similar to “weak” and “unowned”)::
>>
>> struct COWType {
>> copy_on_write var b : COWBuffer
>>
>> // ...
>> }
>>
>> The ``copy_on_write`` attribute is purely used for optimization purposes.
>> It does not change the semantics of the program.
>
> Presumably, it changes what code you can execute on `b` without invoking
> traps or undefined behavior. Otherwise, the optimizer wouldn't be able
> to do anything differently to take advantage of the annotation.
That’s true.
> What are the rules for writing code that uses `copy_on_write`?
See below ("The rules for using ``copy_on_write`` and the built-ins are:”)
>
>> .. note::
>>
>> “copy_on_write” is a working title. TODO: decide on the name.
>> Maybe it should be a @-attribute, like @copy_on_write?
>> Another question is if we should open this attribute for the public or just
>> use it internally in the library, because violating the implied rules
>> (see below) could break memory safety.
>>
>> Implementation
>> --------------
>>
>> The ``copy_on_write`` references can be represented in the AST as a special
>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented.
>> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced
>> buffer class type.
>>
>> In SIL the buffer reference will have type::
>>
>> $@sil_cow COWBuffer
>>
>> where ``COWBuffer`` is the type of the referenced heap object.
>>
>> Two conversion instructions are needed to convert from a ``@sil_cow`` reference
>> type to a regular reference type::
>>
>> cow_to_ref
>> ref_to_cow
>>
>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``.
>>
>> For example the SIL code for::
>>
>> var c: COWType
>> let x = c.b.someData
>>
>> would be::
>>
>> %1 = struct_extract %1 : COWType, #COWType.b
>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData
>> %4 = load %3 : $*Int
>>
>> The ``ref_to_cow`` instruction is needed to store a new buffer reference into a
>> COW type.
>>
>> COW Buffers and the Optimizer
>> =============================
>>
>> A reference to a COW buffer gives the optimizer additional information:
>>
>> *A buffer, referenced by a @sil_cow reference is considered to be immutable
>> during the lifetime of the reference.*
>
> This seems like much too broad a rule to allow inplace mutations of
> uniquely referenced buffers.
The point is that all mutations must be guarded by an is_unique, which takes the _address_ of the buffer reference as argument.
And the optimizer considers this instruction as a potential write to the buffer reference.
The effect is that the lifetime of a buffer reference (as a SIL value) will not outlive a is_unique - regardless if this is inside a called function or inlined.
> Unless you mean the reference is
> immutable, rather than the storage being referred to by it.
>
>> This means any address derived from a ``cow_to_ref`` instruction can be
>> considered to point to immutable memory.
>>
>> Some examples of optimizations which will benefit from copy-on-write
>> representation in SIL:
>>
>> - Redundant load elimination
>>
>> RLE can assume that opaque code does not modify a COW buffer.
>
> How do you distinguish “opaque code” from “code that is meant to
> modify the buffer and might do so in place if it's uniquely-referenced?”
Again, the is_unique which takes the address of the reference, will guarantee that during the lifetime of a buffer there are no modifications of the buffer.
>
>> Example::
>>
>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>> %3 = ref_element_addr %2 : $COWBuffer, #someData
>> %4 = load %3 : $*Int
>> %5 = apply %foo() // Cannot overwrite memory location %3
>> %6 = load %3 : $*Int // Can be replaced by %4
>>
>> Currently we do some ad-hoc optimizations for array, based on semantics,
>> like array count propagation. These hacks would not be needed
>> anymore.
>
> W0000000000000000000000t.
>
>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a
>> projected address) escapes. Even if it escapes, it will reference immutable
>> memory.
>>
>> - CSE, loop hoisting
>>
>> Similar to RLE: the optimizer can assume that opaque code cannot modify a
>> COW buffer
>
> Same question here as above, then.
>>
>> - ARC optimization
>>
>> Knowing that some opaque code cannot overwrite a reference in the COW buffer
>> can remove retain/release pairs across such code::
>>
>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>> %3 = ref_element_addr %2 : $COWBuffer, #someRef
>> %4 = load_strong %3 : $*MyClass // Can do a load_strong [guarantee]
>> %5 = apply %foo() // Cannot overwrite someRef and dealloc the object
>> // Use %4
>> destroy_value %4 : $MyClass
>>
>> Scoping instructions
>> --------------------
>>
>> To let the optimizer reason about the immutability of the COW buffer, it is
>> important to *bind* the lifetime of the buffer content to the lifetime of the
>> buffer reference. For example::
>>
>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference
>> // load something from %b1
>> %a = apply %foo(%baddr : $@sil_cow COWBuffer)
>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference again
>> // load something from %b2
>>
>> The question is: can RLE forward the load of the buffer reference and replace
>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the
>> buffer.
>>
>> To enforce this restriction, the scope of any buffer modification must be
>> enclosed in a pair of SIL instructions. Those instructions define the scope
>> of the mutation. Both instructions take the *address* of the buffer
>> reference as operand and act as a potential write to the buffer reference.
>>
>> The purpose of the scoping instructions is to strictly separate the liferanges
>> of references to an immutable buffer and references to the mutable buffer.
>
> Looks reasonable.
>
>> The following example shows why the scoping instructions (specifically the
>> end-of-scope instruction) are required to prevent loop-hoisting from
>> interleaving mutable and immutable liferanges::
>>
>> // there should be a begin-of-scope %baddr
>> %mut_b = load %baddr
>> store %x to %mut_b // modification of the buffer
>> // there should be a end-of-scope %baddr
>>
>> loop {
>> %b = load %baddr
>> %y = load %b // load from the buffer
>> ...
>> }
>>
>> If there is no end-of-scope instruction, loop hoisting could do::
>>
>> %mut_b = load %baddr
>> %b = load %baddr // moved out of the loop
>> store %x to %mut_b
>>
>> loop {
>> %y = load %b
>> ...
>> }
>>
>> Now the optimizer assumes that ``%b`` references an immutable buffer, so it could
>> also hoist the load::
>>
>> %mut_b = load %baddr
>> %b = load %baddr
>> %y = load %b // Wrong! Will be overwritten by the following store
>> store %x to %mut_b
>>
>> loop {
>> ...
>> }
>>
>>
>> The following sections describe two alternatives to implement the scoping.
>>
>> Scoping Alternative 1: Explicit Built-ins
>> -----------------------------------------
>>
>> SIL instructions
>> ^^^^^^^^^^^^^^^^
>>
>> The existing ``is_unique`` instruction is changed to a terminator instruction::
>>
>> bb0:
>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the address of the COWBuffer reference
>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique reference. Physically identical to "load %0”
>> // usually empty
>> br bb3(%1 : $COWBuffer)
>> bb2: // the false-block
>> // usually contains:
>> %2 = apply %copy_buffer
>> %3 = cow_to_ref %2
>> store_strong %3 to %0 : $*@sil_cow COWBuffer
>> br bb3(%2 : $COWBuffer)
>> bb3(%4 : $COWBuffer):
>> // Modify the buffer referenced by %4
>> // ...
>>
>> The end-of-scope instruction is::
>>
>> end_unique_addr %0 : $*COWBuffer
>>
>> It is important that the references to the unique buffers (``%1``, ``%2``) must
>> not outlive ``end_unique_addr``. In most cases this can be check by the SIL
>> verifier.
>>
>> The two instructions must be paired properly but not necessarily in the
>> same function.
>>
>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to
>> separate the lifetimes of mutable and immutable accesses to the COW buffer.
>> Both instructions take an address to the COW buffer reference and are
>> considered as potential stores to the reference.
>> This makes sure that the SIL optimizer cannot mix-up buffer reference lifetimes
>> across these instructions.
>> For example, RLE cannot combine two buffer loads which are interleaved with
>> a ``is_unique_addr_br``::
>>
>> %1 = load_strong %0 : $*@sil_cow COWBuffer
>> // do something with %1
>> …
>> is_unique_addr_br %0 : $*@sil_cow COWBuffer
>> …
>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this with %1
>>
>> Another important thing is that the COW buffer can only be mutated by using the
>> reference of the ``is_unique_addr_br`` true-block argument.
>> The COW buffer cannot be modified by simply loading/extracting the reference
>> from the COWType.
>> Example::
>>
>> %1 = load_strong %0 : $*COWBuffer
>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>> %3 = ref_element_addr %2 : $COWBuffer, #someData
>> store %7 : $Int to %3 : $*Int // Violation!
>>
>> Most obvious violations to this constraint can be catched by the SILVerifier.
>>
>> The ``_addr`` variants of the instructions also have a non-addr counterpart::
>>
>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces the true-block arg as owned
>>
>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned
>>
>> These instructions are generated by Mem2reg (or a similar optimization)
>> in case the COW value is stored (in a temporary alloc_stack location)
>> just for the sake of passing an address to ``is_unique_addr_br`` and
>> ``end_unique_addr``.
>> For example in the following code, where the COW data is passed as-value and
>> all the mutating functions are inlined::
>>
>> func foo(arr : [Int], x: Int) {
>> arr[0] = 27
>> …
>> y = arr[x]
>> …
>> }
>>
>> Finally it’s probably a good idea to add an instruction for converting an
>> immutable reference to a mutable reference::
>>
>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : $COWBuffer as owned
>>
>> which is basically just a simpler representation of the following pattern::
>>
>> bb0:
>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2
>> bb1(%1 : $COWBuffer):
>> … // main control flow continues here
>> bb2:
>> unreachable
>>
>> An optimizations, which eliminate uniqueness checks, would replace a
>> ``is_unique_br`` by a ``start_unique``.
>>
>> Built-ins
>> ^^^^^^^^^
>>
>> A COW type implementor can generate the new instructions by using a set of built-ins::
>>
>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType?
>> func endUnique<BufferType>(_ buffer: inout BufferType)
>>
>> For example::
>>
>> struct COWType {
>> copy_on_write var b : COWBuffer
>>
>> mutating func makeMutable() -> COWBuffer {
>> if let uniqueBuffer = isUnique(&self.b) {
>> return uniqueBuffer
>> }
>> let copiedBuffer = copyBuffer(self.b)
>> self.b = copiedBuffer
>> return copiedBuffer
>> }
>>
>> mutating func setSomeData(x: Int) {
>> let uniqueBuffer = makeMutable()
>> uniqueBuffer.someData = x
>> endUnique(&self.b)
>> }
>> }
>
> This seems reasonable, but it also looks like the compiler could do the
> `endUnique` dance for us based, e.g., on the mutability of methods.
I agree, that would be ideal, e.g. the compiler could insert the endUnique at the end of an inout scope.
>
>> The ``isUnique`` built-in returns an optional unique buffer reference.
>> Physically this is the COW buffer which is passed as the inout argument.
>> The result is nil if the buffer is not uniquely referenced.
>> In this case usually the original buffer is copied and the reference to the
>> copy is written back to the original buffer reference location
>> (``self.b = copiedBuffer``).
>> Starting at the point of the write-back, the reference to the copy also becomes
>> a unique buffer reference.
>>
>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern which
>> constructs the Optional in the successor blocks. Using ``isUnique`` in an
>> if-let (as shown above) will end up in two consecutive CFG "diamonds".
>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond.
>>
>> .. note::
>> This makes the definition of the unique buffer location lifetime a little bit
>> problematic, because the false-branch of ``isUnique`` is not equivalent to
>> the false-branch of the ``is_unique_addr_br`` instruction (before SimplifyCFG
>> can do its job).
>
> I don't know what the implications of these diamonds and the problem
> described above might be, FWIW.
>
>> The rules for using ``copy_on_write`` and the built-ins are:
>>
>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in the
>> same function.
>>
>> 2. The COW buffer may only be mutated by using the unique buffer reference.
>>
>> 3. The COW buffer must not be mutated outside the ``isUnique`` - ``endUnique``
>> pair.
>>
>> 4. During the lifetime of the unique buffer reference, the original COW buffer
>> reference must not be used in any way, e.g. for reading from the buffer.
>>
>> Note that the lifetime of the unique buffer reference does not include the
>> part between the begin of the ``isUnique`` false-branch and the write-back
>> of the copy. This means is okay to read from the buffer (using ``self.b``)
>> for the purpose of copying.
>>
>> Examples::
>>
>> mutating func setSomeData(x: Int) {
>> let uniqueBuffer = makeMutable()
>> uniqueBuffer.someData = x
>> // violates rule 1
>> }
>>
>> mutating func setSomeData(x: Int) {
>> makeMutable()
>> self.b.someData = x // violates rule 2
>> endUnique(&self.b)
>> }
>>
>> mutating func setSomeData(x: Int) {
>> let uniqueBuffer = makeMutable()
>> uniqueBuffer.someData = x
>> endUnique(&self.b)
>> uniqueBuffer.someData = 27 // violates rule 3
>> }
>>
>> mutating func incrementSomeData() {
>> let uniqueBuffer = makeMutable()
>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4
>> endUnique(&self.b)
>> }
>
> It would be instructive to write down the *correct* code for these
> operations.
added to my todo list.
>
>> The intention of the rules is to ensure that there is no overlap of a
>> "read-only" life-range with a "mutable" life-range of the buffer reference.
>> It’s the responsibility of the implementor to follow the rules.
>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can
>> statically detect rule violations in obvious cases (with inter-procedural
>> analysis maybe even in most cases).
>>
>> This approach would require to change some of the internals of our
>> current COW data structures in the stdlib (Array, Dictionary, etc.).
>> For example, the Array make_mutable semantic functions currently do not return
>> the unique buffer.
>
> No big deal.
>
>> Scoping Alternative 2: Implicit Inout Scopes
>> --------------------------------------------
>>
>> There is an idea (proposal?) to change the representation of inout variables
>> in SIL. This is independent of this proposal, but can be helpful for the
>> purpose of defining the scope of a COW mutation.
>>
>> The basic idea is that SILGen inserts scoping instructions for *all* inout
>> variables. And those scoping instructions can be used to define the mutating
>> scope of a COW buffer.
>>
>> The scoping instructions which are inserted by SILGen for an inout scope are::
>>
>> begin_exclusive
>> end_exclusive
>>
>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those instructions take the
>> address of the inout variable as argument. For the optimizer those instructions
>> look like potential writes to the inout variable.
>>
>> The implementor of a COW type has to follow the rule that the COW buffer may
>> only be modified in mutating functions of the COW type. But this is the case
>> anyway because any modification needs a uniqueness check and this can only be
>> done in mutating functions.
>>
>> Example::
>>
>> // > mutating func setSomeData(x: Int) {
>> // Accepts a unique reference to the array value (avoiding refcount operations)
>> sil @setSomeData : $(Int, @inout Array) -> () {
>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0
>>
>> // > makeMutable() (inlined)
>> // Forward the unique reference to the `self` array value, still avoiding refcount operations.
>> // Begin the inlined exclusive scope (could be trivially removed).
>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1
>>
>> // > if !isUnique(&self._storage) {
>> // Extract a unique inout reference to the class reference to the array storage.
>> // This begins the isUnique() argument's exclusive scope. The memory is already exclusive
>> // but the scope helps ensure this is the only alias to _storage.
>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, #Array._storage
>>
>> // Uniqueness checking requires an inout reference to the class reference.
>> // The is_unique instruction does not need to create a new storage reference.
>> // It's only purpose is to check the RC count, ensure that the checked reference
>> // is inout, and prevent the inout scope from being optimized away.
>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>>
>> // End the isUnique argument's exclusive scope (can also be trivially removed).
>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>>
>> br %isuniq, bb_continue, bb_slow
>>
>> bb_slow:
>> // > self._storage = copyBuffer(self._storage)
>> // Produce a new class reference to storage with verifiably unique RC semantics.
>> %copied_storage_class = alloc_ref ...
>> // A begin/end exclusive scope is implicit in store [assign].
>> store [assign] %copied_storage_class to %arrayref._storageref
>> br bb_continue
>>
>> bb_continue:
>>
>> // This marks the end of makeMutable's inout `self` scope. Because Array
>> // contains a "copy_on_write" property, the SIL verifier needs to
>> // prove that %arrayref.#_storage has not escaped at this point. This
>> // is equivalent to checking that %arrayref itself is not copied, and
>> // checking each projection of the "copy_on_write" storage property
>> // (%arrayref._storageref) is not copied. Or, if any copies are present,
>> // they must be consumed within this scope.
>> end_exclusive %arrayref : $*Array<T> // End scope #1
>>
>> // > self._storage.someData = x
>> // An _addr instruction with one load/store use doesn't really need its own scope.
>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage
>>
>> // ARC optimization can promote this to a borrow, replacing strong_release with end_borrow.
>> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow ArrayStorage
>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow ArrayStorage
>>
>> // Write some data into the CoW buffer.
>> // (For simplicity, pretend ArrayStorage has a "someData" field).
>> // A single-use _addr instruction, so no scope.
>> %somedata_addr = ref_element_addr %arrayref._storage, #someData
>> // A store with an implicit [exclusive] scope.
>> store [assign] %x to %somedata_addr
>>
>> strong_release %arrayref._storage : $*ArrayStorage<T>
>>
>> // End the isUnique argument's exclusive scope.
>> // The same verification is needed here, but the inner scope would be eliminated.
>> end_exclusive %arrayref : $*Array<T> // End scope #0
>>
>> In general this approach looks more "user-friendly" than the first
>> alternative.
>
> Well, I can't really tell, because you haven't shown the Swift code that
> generates this SIL.
>
>> But it depends on implementing the general feature to insert the inout
>> scoping instructions. Also, we still have to think through all the
>> details of this approach.
>
> FWIW, I am convinced we will need (and get) a stricter inout model that
> would be conducive to inserting the scoping instructions.
>
>
>> Dependency between a buffer reference to the scope-begin
>> --------------------------------------------------------
>
> You can only have a dependency between two things, but as phrased “a
> buffer reference to the scope-begin” sounds like one thing. s/to/and/
> would fix it.
>
>> With both alternatives there is no explicit dependency from a buffer reference
>> to a scope-begin instruction::
>>
>> %b_cow = load %baddr
>> %b = cow_to_ref %b_cow
>> %x = load %b // No dependency between this...
>> ...
>> begin_exclusive %baddr // ... and this instruction.
>> ...
>>
>> So in theory the optimizer is free to reschedule the instructions::
>>
>> %b_cow = load %baddr
>> %b = cow_to_ref %b_cow
>> ...
>> begin_exclusive %baddr
>> %x = load %b // Wrong! Buffer could be modified here
>> ...
>>
>> We still have to figure out how to cope with this.
>>
>> - We could add an end-of-lifetime instruction for a COW buffer reference, which
>> the optimizer may not move over a begin-of-scope instruction.
>>
>> - Or we just define the implicit rule for the optimizer that any use of a COW
>> reference may not be moved over a begin-of-scope instruction.
>>
>> Preconditions
>> =============
>>
>> To benefit from COW optimizations in the stdlib Array, Set and Dictionary data
>> structures we first need eager bridging, meaning getting rid of the bridged
>> buffer.
>
> As you know I'm very much in favor of eager bridging, but I don't see
> why this would be dependent on it.
We could use copy_on_write with eager bridging, but I don’t think it will give any benefits to the optimizer.
For example, the SIL code to get from an Array to a native ContiguousArrayStorage reference is pretty hard to understand for the optimizer (involves low level bit operations, etc.).
>
>> At least for Array this is implemented as low-level bit operations and
>> optimizations cannot reason about it (e.g. finding a reasonable
>> RC-root for the buffer reference).
>>
>> Another thing is that we currently cannot use ``copy_on_write`` for Array
>> because of pinning. Array pins it’s buffer when passing an element address to
>> an inout parameter. This allows the array buffer to be modified even if its
>> reference count is > 1. With ``copy_on_write``, a programmer could break memory
>> safety when violating the inout rule. Example::
>>
>> var arr = [MyClass()] // a global array
>>
>> foo(&arr[0]) // Pins the buffer of arr during the call
>>
>> func foo(_ x: inout MyClass) -> Int {
>> let b = arr // The ref-count of the buffer is not incremented, because it is pinned!
>> let r = b[0] // optimizer removes the retain of r because it thinks the following code cannot modify b
>> arr.removeAll() // does not copy the array buffer and thus de-allocates r
>> return r.i // use-after-free!
>> }
>
> I only know of one way to resolve inout and pinning:
>
> * Semantically, references are replaced with a trap value when entering
> an inout context so that all inout values are provably unique
> references in the absence of unsafe code. We drop pinning and provide
> explicit operations that provide simultaneous lvalue accesses to
> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices.
>
> If there are other ideas out there, I'd like to hear them. If not, we
> should probably decide that this is what we're doing so that we can move
> forward without this looming uncertainty.
>
> --
> -Dave
>
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161017/671af6dd/attachment.html>
More information about the swift-dev
mailing list