<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><div class=""><div class="">On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev <<a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class=""></div></div><blockquote type="cite" class=""><div class=""><div class=""><span class="" style="float: none; display: inline !important;">on Thu Oct 13 2016, Joe Groff <</span><a href="http://swift-dev-at-swift.org/" class="">swift-dev-AT-swift.org</a><span class="" style="float: none; display: inline !important;">> wrote:</span><br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev <<a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a>> wrote:<br class=""><br class="">This is a proposal for representing copy-on-write buffers in<br class="">SIL. Actually it’s still a draft for a proposal. It also heavily<br class="">depends on how we move forward with SIL ownership.<br class=""><CopyOnWrite.rst><br class="">If you have any comments, please let me know.<br class=""></blockquote><br class="">The SIL-level design seems sensible to me at a glance. At the language<br class="">level, I think it would make more sense to treat this as an attribute<br class="">on class types rather than on properties in structs using the class. I<br class="">don't think many people reuse class definitions as both shared<br class="">reference types and as value type payloads, <br class=""></blockquote><br class=""><span class="" style="float: none; display: inline !important;">Foundation does, or would if they could.</span><br class=""><br class=""><blockquote type="cite" class="">but beyond that, I think that making it an attribute of classes would<br class="">put us into a better position to leverage the borrow model to enforce<br class="">the "mutable-only-when-unique" aspect of COW implementations. John<br class="">alluded to this in the "SIL address types and borrowing" thread:<br class=""><br class=""><blockquote type="cite" class="">I wonder if it would make more sense to make copy-on-write buffer<br class="">references a move-only type, so that as long as you were just<br class="">working with the raw reference (as opposed to the CoW aggregate,<br class="">which would remain copyable) it wouldn't get implicitly copied<br class="">anymore. You could have mutable and immutable buffer reference<br class="">types, both move-only, and there could be a consuming checkUnique<br class="">operation on the immutable one that, I dunno, returned an Either of<br class="">the mutable and immutable versions.<br class=""><br class="">For CoW aggregates, you'd need some @copied attribute on the field<br class="">to make sure that the CoW attribute was still copyable. Within the<br class="">implementation of the type, though, you would be projecting out the<br class="">reference immediately, and thereafter you'd be certain that you were<br class="">borrowing / moving it around as appropriate.<br class=""></blockquote><br class="">If 'copy-on-write' were a trait on classes, then we could distinguish<br class="">unique and nonunique references to the class. A unique reference would<br class="">act like a move-only type to prevent accidental loss of uniqueness. <br class=""></blockquote><br class=""><span class="" style="float: none; display: inline !important;">+1</span><br class=""><br class=""><blockquote type="cite" class="">We can also allow a copy-on-write class to have "mutating" methods,<br class="">and only allow mutation on unique references. It seems to me like,<br class="">exploring this direction, we could also come up with a way for the<br class="">high-level value-semantics operations on the struct to statically<br class="">indicate which methods are known to leave the value's buffers in a<br class="">unique state, or which return values that are uniquely owned, which<br class="">would give the optimizer more ability to avoid uniqueness checks<br class="">across calls without relying on inlining and IPO.<br class=""></blockquote><br class=""><span class="" style="float: none; display: inline !important;">That's pretty cool. However, I think there's nothing to prevent any</span><br class=""><span class="" style="float: none; display: inline !important;">mutating method from storing a copy of self in a global, so I think we'd</span><br class=""><span class="" style="float: none; display: inline !important;">need some participation from the programmer (either an agreement not to</span><br class=""><span class="" style="float: none; display: inline !important;">do that, or an explicit claim of uniqueness on exit) in order to</span><br class=""><span class="" style="float: none; display: inline !important;">identify operations that create/preserve uniqueness.</span><br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">If a mutating reference (like self in a mutating method) is move-only then you would not be able to “copy” it to a global.</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""></div></div><br class=""></blockquote><div><blockquote type="cite" class=""><div class="">On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev <<a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><br class="">on Tue Oct 11 2016, Erik Eckstein <<a href="http://swift-dev-AT-swift.org" class="">swift-dev-AT-swift.org</a>> wrote:<br class=""><br class=""><blockquote type="cite" class="">This is a proposal for representing copy-on-write buffers in<br class="">SIL. Actually it’s still a draft for a proposal. It also heavily<br class="">depends on how we move forward with SIL ownership.<br class=""><br class="">:orphan:<br class=""><br class="">.. highlight:: sil<br class=""><br class="">===================================<br class="">Copy-On-Write Representation in SIL<br class="">===================================<br class=""><br class="">.. contents::<br class=""><br class="">Overview<br class="">========<br class=""><br class="">This document proposes:<br class=""><br class="">- An ownership attribute to define copy-on-write (COW) buffers in Swift data<br class=""> types.<br class=""><br class="">- A representation of COW buffers in SIL so that optimizations can take benefit<br class=""> of it.<br class=""><br class="">The basic idea is to enable the SIL optimizer to reason about COW data types<br class="">in the same way as a programmer can do.<br class="">This means: a COW buffer can only be modified by its owning SIL value, because<br class="">either it's uniquely referenced or the buffer is copied before modified.<br class=""><br class="">.. note::<br class=""> In the following the term "buffer" refers to a Swift heap object.<br class=""> It can be any heap object, not necessarily a “buffer” with e.g. tail-allocated elements.<br class=""><br class="">COW Types<br class="">=========<br class=""><br class="">The basic structure of COW data types can be simplified as follows::<br class=""><br class=""> class COWBuffer {<br class=""> var someData: Int<br class=""> ...<br class=""> }<br class=""><br class=""> struct COWType {<br class=""> var b : COWBuffer<br class=""><br class=""> mutating func change_it() {<br class=""> if (!isUniquelyReferenced(b)) {<br class=""> b = copy_buffer(b)<br class=""> }<br class=""> b.someData = ...<br class=""> }<br class=""> }<br class=""><br class="">Currently the COW behavior of such types is just defined by their implementation.<br class="">But there is no representation of this special behavior in the SIL.<br class="">So the SIL optimizer has no clue about it and cannot take advantage of it.<br class=""><br class="">For example::<br class=""><br class=""> func foo(arr : [Int]) {<br class=""> x = arr[0]<br class=""> opaque_function()<br class=""> y = arr[0] // can RLE replace this with y = x?<br class=""> }<br class=""><br class="">If opaque_function() wants to change the contents of the array buffer it first<br class="">has to copy it. <br class=""></blockquote><br class="">...or determine that it's uniquely-referenced.<br class=""></div></div></blockquote><div><br class=""></div><div>In this specific example, if opqaue_function holds a reference to arr’s buffer, the buffer is not uniquely-referenced.</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class="">But the optimizer does not know it so it has to conservatively assume<br class="">that opaque_function() will write to the location of arr[0].<br class=""><br class="">Copy-on-write Ownership Attribute<br class="">=================================<br class=""><br class="">This section proposes an ownership attribute to define a copy-on-write buffer.<br class=""><br class="">Swift Syntax<br class="">------------<br class=""><br class="">A COW buffer reference can be defined with a new ownership attribute for the<br class="">buffer variable declaration (similar to “weak” and “unowned”)::<br class=""><br class=""> struct COWType {<br class=""> copy_on_write var b : COWBuffer<br class=""><br class=""> // ...<br class=""> }<br class=""><br class="">The ``copy_on_write`` attribute is purely used for optimization purposes.<br class="">It does not change the semantics of the program.<br class=""></blockquote><br class="">Presumably, it changes what code you can execute on `b` without invoking<br class="">traps or undefined behavior. Otherwise, the optimizer wouldn't be able<br class="">to do anything differently to take advantage of the annotation.</div></div></blockquote><div><br class=""></div><div>That’s true.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="">What are the rules for writing code that uses `copy_on_write`?<br class=""></div></div></blockquote><div><br class=""></div><div>See below ("The rules for using ``copy_on_write`` and the built-ins are:”)</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class="">.. note::<br class=""><br class=""> “copy_on_write” is a working title. TODO: decide on the name.<br class=""> Maybe it should be a @-attribute, like @copy_on_write?<br class=""> Another question is if we should open this attribute for the public or just<br class=""> use it internally in the library, because violating the implied rules<br class=""> (see below) could break memory safety.<br class=""><br class="">Implementation<br class="">--------------<br class=""><br class="">The ``copy_on_write`` references can be represented in the AST as a special<br class="">``StorageType``, just like how ``unowned`` and ``weak`` is represented.<br class="">The canonical type of a ``CopyOnWriteStorageType`` would be the referenced<br class="">buffer class type.<br class=""><br class="">In SIL the buffer reference will have type::<br class=""><br class=""> $@sil_cow COWBuffer<br class=""><br class="">where ``COWBuffer`` is the type of the referenced heap object.<br class=""><br class="">Two conversion instructions are needed to convert from a ``@sil_cow`` reference<br class="">type to a regular reference type::<br class=""><br class=""> cow_to_ref<br class=""> ref_to_cow<br class=""><br class="">Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``.<br class=""><br class="">For example the SIL code for::<br class=""><br class=""> var c: COWType<br class=""> let x = c.b.someData<br class=""><br class="">would be::<br class=""><br class=""> %1 = struct_extract %1 : COWType, #COWType.b<br class=""> %2 = cow_to_ref %1 : $@sil_cow COWBuffer<br class=""> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData<br class=""> %4 = load %3 : $*Int<br class=""><br class="">The ``ref_to_cow`` instruction is needed to store a new buffer reference into a<br class="">COW type.<br class=""><br class="">COW Buffers and the Optimizer<br class="">=============================<br class=""><br class="">A reference to a COW buffer gives the optimizer additional information:<br class=""><br class=""> *A buffer, referenced by a @sil_cow reference is considered to be immutable<br class=""> during the lifetime of the reference.*<br class=""></blockquote><br class="">This seems like much too broad a rule to allow inplace mutations of<br class="">uniquely referenced buffers. </div></div></blockquote><div><br class=""></div><div>The point is that all mutations must be guarded by an is_unique, which takes the _address_ of the buffer reference as argument.</div><div>And the optimizer considers this instruction as a potential write to the buffer reference.</div><div>The effect is that the lifetime of a buffer reference (as a SIL value) will not outlive a is_unique - regardless if this is inside a called function or inlined.</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""> Unless you mean the reference is<br class="">immutable, rather than the storage being referred to by it.<br class=""><br class=""><blockquote type="cite" class="">This means any address derived from a ``cow_to_ref`` instruction can be<br class="">considered to point to immutable memory.<br class=""><br class="">Some examples of optimizations which will benefit from copy-on-write<br class="">representation in SIL:<br class=""><br class="">- Redundant load elimination<br class=""><br class=""> RLE can assume that opaque code does not modify a COW buffer.<br class=""></blockquote><br class="">How do you distinguish “opaque code” from “code that is meant to<br class="">modify the buffer and might do so in place if it's uniquely-referenced?”<br class=""></div></div></blockquote><div><br class=""></div><div>Again, the is_unique which takes the address of the reference, will guarantee that during the lifetime of a buffer there are no modifications of the buffer.</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class=""> Example::<br class=""><br class=""> %2 = cow_to_ref %1 : $@sil_cow COWBuffer<br class=""> %3 = ref_element_addr %2 : $COWBuffer, #someData<br class=""> %4 = load %3 : $*Int<br class=""> %5 = apply %foo() // Cannot overwrite memory location %3<br class=""> %6 = load %3 : $*Int // Can be replaced by %4<br class=""><br class=""> Currently we do some ad-hoc optimizations for array, based on semantics,<br class=""> like array count propagation. These hacks would not be needed<br class=""> anymore.<br class=""></blockquote><br class="">W0000000000000000000000t.<br class=""><br class=""><blockquote type="cite" class=""> Note that it’s not required to check if a ``cow_to_ref`` reference (or a<br class=""> projected address) escapes. Even if it escapes, it will reference immutable<br class=""> memory.<br class=""><br class="">- CSE, loop hoisting<br class=""><br class=""> Similar to RLE: the optimizer can assume that opaque code cannot modify a<br class=""> COW buffer<br class=""></blockquote><br class="">Same question here as above, then.<br class=""><blockquote type="cite" class=""><br class="">- ARC optimization<br class=""><br class=""> Knowing that some opaque code cannot overwrite a reference in the COW buffer<br class=""> can remove retain/release pairs across such code::<br class=""><br class=""> %2 = cow_to_ref %1 : $@sil_cow COWBuffer<br class=""> %3 = ref_element_addr %2 : $COWBuffer, #someRef<br class=""> %4 = load_strong %3 : $*MyClass // Can do a load_strong [guarantee]<br class=""> %5 = apply %foo() // Cannot overwrite someRef and dealloc the object<br class=""> // Use %4<br class=""> destroy_value %4 : $MyClass<br class=""><br class="">Scoping instructions<br class="">--------------------<br class=""><br class="">To let the optimizer reason about the immutability of the COW buffer, it is<br class="">important to *bind* the lifetime of the buffer content to the lifetime of the<br class="">buffer reference. For example::<br class=""><br class=""> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference<br class=""> // load something from %b1<br class=""> %a = apply %foo(%baddr : $@sil_cow COWBuffer)<br class=""> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference again<br class=""> // load something from %b2<br class=""><br class="">The question is: can RLE forward the load of the buffer reference and replace<br class="">``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the<br class="">buffer.<br class=""><br class="">To enforce this restriction, the scope of any buffer modification must be<br class="">enclosed in a pair of SIL instructions. Those instructions define the scope<br class="">of the mutation. Both instructions take the *address* of the buffer<br class="">reference as operand and act as a potential write to the buffer reference. <br class=""><br class="">The purpose of the scoping instructions is to strictly separate the liferanges<br class="">of references to an immutable buffer and references to the mutable buffer.<br class=""></blockquote><br class="">Looks reasonable.<br class=""><br class=""><blockquote type="cite" class="">The following example shows why the scoping instructions (specifically the<br class="">end-of-scope instruction) are required to prevent loop-hoisting from<br class="">interleaving mutable and immutable liferanges::<br class=""><br class=""> // there should be a begin-of-scope %baddr<br class=""> %mut_b = load %baddr<br class=""> store %x to %mut_b // modification of the buffer<br class=""> // there should be a end-of-scope %baddr<br class=""><br class=""> loop {<br class=""> %b = load %baddr<br class=""> %y = load %b // load from the buffer<br class=""> ...<br class=""> }<br class=""><br class="">If there is no end-of-scope instruction, loop hoisting could do::<br class=""><br class=""> %mut_b = load %baddr<br class=""> %b = load %baddr // moved out of the loop<br class=""> store %x to %mut_b<br class=""><br class=""> loop {<br class=""> %y = load %b<br class=""> ...<br class=""> }<br class=""><br class="">Now the optimizer assumes that ``%b`` references an immutable buffer, so it could<br class="">also hoist the load::<br class=""><br class=""> %mut_b = load %baddr<br class=""> %b = load %baddr<br class=""> %y = load %b // Wrong! Will be overwritten by the following store<br class=""> store %x to %mut_b<br class=""><br class=""> loop {<br class=""> ...<br class=""> }<br class=""><br class=""><br class="">The following sections describe two alternatives to implement the scoping.<br class=""><br class="">Scoping Alternative 1: Explicit Built-ins<br class="">-----------------------------------------<br class=""><br class="">SIL instructions<br class="">^^^^^^^^^^^^^^^^<br class=""><br class="">The existing ``is_unique`` instruction is changed to a terminator instruction::<br class=""><br class=""> bb0:<br class=""> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the address of the COWBuffer reference<br class=""> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique reference. Physically identical to "load %0”<br class=""> // usually empty<br class=""> br bb3(%1 : $COWBuffer)<br class=""> bb2: // the false-block<br class=""> // usually contains:<br class=""> %2 = apply %copy_buffer<br class=""> %3 = cow_to_ref %2<br class=""> store_strong %3 to %0 : $*@sil_cow COWBuffer<br class=""> br bb3(%2 : $COWBuffer)<br class=""> bb3(%4 : $COWBuffer):<br class=""> // Modify the buffer referenced by %4<br class=""> // ...<br class=""><br class="">The end-of-scope instruction is::<br class=""><br class=""> end_unique_addr %0 : $*COWBuffer<br class=""><br class="">It is important that the references to the unique buffers (``%1``, ``%2``) must<br class="">not outlive ``end_unique_addr``. In most cases this can be check by the SIL<br class="">verifier.<br class=""><br class="">The two instructions must be paired properly but not necessarily in the<br class="">same function.<br class=""><br class="">The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to<br class="">separate the lifetimes of mutable and immutable accesses to the COW buffer.<br class="">Both instructions take an address to the COW buffer reference and are<br class="">considered as potential stores to the reference.<br class="">This makes sure that the SIL optimizer cannot mix-up buffer reference lifetimes<br class="">across these instructions.<br class="">For example, RLE cannot combine two buffer loads which are interleaved with<br class="">a ``is_unique_addr_br``::<br class=""><br class=""> %1 = load_strong %0 : $*@sil_cow COWBuffer<br class=""> // do something with %1<br class=""> …<br class=""> is_unique_addr_br %0 : $*@sil_cow COWBuffer<br class=""> …<br class=""> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this with %1<br class=""><br class="">Another important thing is that the COW buffer can only be mutated by using the<br class="">reference of the ``is_unique_addr_br`` true-block argument.<br class="">The COW buffer cannot be modified by simply loading/extracting the reference<br class="">from the COWType.<br class="">Example::<br class=""><br class=""> %1 = load_strong %0 : $*COWBuffer<br class=""> %2 = cow_to_ref %1 : $@sil_cow COWBuffer<br class=""> %3 = ref_element_addr %2 : $COWBuffer, #someData<br class=""> store %7 : $Int to %3 : $*Int // Violation!<br class=""><br class="">Most obvious violations to this constraint can be catched by the SILVerifier.<br class=""><br class="">The ``_addr`` variants of the instructions also have a non-addr counterpart::<br class=""><br class=""> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces the true-block arg as owned<br class=""><br class=""> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned<br class=""><br class="">These instructions are generated by Mem2reg (or a similar optimization)<br class="">in case the COW value is stored (in a temporary alloc_stack location)<br class="">just for the sake of passing an address to ``is_unique_addr_br`` and<br class="">``end_unique_addr``.<br class="">For example in the following code, where the COW data is passed as-value and<br class="">all the mutating functions are inlined::<br class=""><br class=""> func foo(arr : [Int], x: Int) {<br class=""> arr[0] = 27<br class=""> …<br class=""> y = arr[x]<br class=""> …<br class=""> }<br class=""><br class="">Finally it’s probably a good idea to add an instruction for converting an<br class="">immutable reference to a mutable reference::<br class=""><br class=""> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : $COWBuffer as owned<br class=""><br class="">which is basically just a simpler representation of the following pattern::<br class=""><br class=""> bb0:<br class=""> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2<br class=""> bb1(%1 : $COWBuffer):<br class=""> … // main control flow continues here<br class=""> bb2:<br class=""> unreachable<br class=""><br class="">An optimizations, which eliminate uniqueness checks, would replace a<br class="">``is_unique_br`` by a ``start_unique``.<br class=""><br class="">Built-ins<br class="">^^^^^^^^^<br class=""><br class="">A COW type implementor can generate the new instructions by using a set of built-ins::<br class=""><br class=""> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType?<br class=""> func endUnique<BufferType>(_ buffer: inout BufferType) <br class=""><br class="">For example::<br class=""><br class=""> struct COWType {<br class=""> copy_on_write var b : COWBuffer<br class=""><br class=""> mutating func makeMutable() -> COWBuffer {<br class=""> if let uniqueBuffer = isUnique(&self.b) {<br class=""> return uniqueBuffer<br class=""> }<br class=""> let copiedBuffer = copyBuffer(self.b)<br class=""> self.b = copiedBuffer<br class=""> return copiedBuffer<br class=""> }<br class=""><br class=""> mutating func setSomeData(x: Int) {<br class=""> let uniqueBuffer = makeMutable()<br class=""> uniqueBuffer.someData = x<br class=""> endUnique(&self.b)<br class=""> }<br class=""> }<br class=""></blockquote><br class="">This seems reasonable, but it also looks like the compiler could do the<br class="">`endUnique` dance for us based, e.g., on the mutability of methods. <br class=""></div></div></blockquote><div><br class=""></div><div>I agree, that would be ideal, e.g. the compiler could insert the endUnique at the end of an inout scope.</div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class="">The ``isUnique`` built-in returns an optional unique buffer reference.<br class="">Physically this is the COW buffer which is passed as the inout argument.<br class="">The result is nil if the buffer is not uniquely referenced.<br class="">In this case usually the original buffer is copied and the reference to the<br class="">copy is written back to the original buffer reference location<br class="">(``self.b = copiedBuffer``).<br class="">Starting at the point of the write-back, the reference to the copy also becomes<br class="">a unique buffer reference.<br class=""><br class="">The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern which<br class="">constructs the Optional in the successor blocks. Using ``isUnique`` in an<br class="">if-let (as shown above) will end up in two consecutive CFG "diamonds".<br class="">Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond.<br class=""><br class="">.. note::<br class=""> This makes the definition of the unique buffer location lifetime a little bit<br class=""> problematic, because the false-branch of ``isUnique`` is not equivalent to<br class=""> the false-branch of the ``is_unique_addr_br`` instruction (before SimplifyCFG<br class=""> can do its job).<br class=""></blockquote><br class="">I don't know what the implications of these diamonds and the problem<br class="">described above might be, FWIW.<br class=""><br class=""><blockquote type="cite" class="">The rules for using ``copy_on_write`` and the built-ins are:<br class=""><br class="">1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in the<br class=""> same function.<br class=""><br class="">2. The COW buffer may only be mutated by using the unique buffer reference.<br class=""><br class="">3. The COW buffer must not be mutated outside the ``isUnique`` - ``endUnique``<br class=""> pair.<br class=""><br class="">4. During the lifetime of the unique buffer reference, the original COW buffer<br class=""> reference must not be used in any way, e.g. for reading from the buffer.<br class=""><br class=""> Note that the lifetime of the unique buffer reference does not include the<br class=""> part between the begin of the ``isUnique`` false-branch and the write-back<br class=""> of the copy. This means is okay to read from the buffer (using ``self.b``)<br class=""> for the purpose of copying.<br class=""><br class="">Examples::<br class=""><br class=""> mutating func setSomeData(x: Int) {<br class=""> let uniqueBuffer = makeMutable()<br class=""> uniqueBuffer.someData = x<br class=""> // violates rule 1<br class=""> }<br class=""><br class=""> mutating func setSomeData(x: Int) {<br class=""> makeMutable()<br class=""> self.b.someData = x // violates rule 2<br class=""> endUnique(&self.b)<br class=""> }<br class=""><br class=""> mutating func setSomeData(x: Int) {<br class=""> let uniqueBuffer = makeMutable()<br class=""> uniqueBuffer.someData = x<br class=""> endUnique(&self.b)<br class=""> uniqueBuffer.someData = 27 // violates rule 3<br class=""> }<br class=""><br class=""> mutating func incrementSomeData() {<br class=""> let uniqueBuffer = makeMutable()<br class=""> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4<br class=""> endUnique(&self.b)<br class=""> }<br class=""></blockquote><br class="">It would be instructive to write down the *correct* code for these<br class="">operations.<br class=""></div></div></blockquote><div><br class=""></div><div>added to my todo list.</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class="">The intention of the rules is to ensure that there is no overlap of a<br class="">"read-only" life-range with a "mutable" life-range of the buffer reference.<br class="">It’s the responsibility of the implementor to follow the rules.<br class="">But the compiler (a mandatory diagnostics pass and the SIL verifier) can<br class="">statically detect rule violations in obvious cases (with inter-procedural<br class="">analysis maybe even in most cases).<br class=""><br class="">This approach would require to change some of the internals of our<br class="">current COW data structures in the stdlib (Array, Dictionary, etc.).<br class="">For example, the Array make_mutable semantic functions currently do not return<br class="">the unique buffer.<br class=""></blockquote><br class="">No big deal.<br class=""><br class=""><blockquote type="cite" class="">Scoping Alternative 2: Implicit Inout Scopes<br class="">--------------------------------------------<br class=""><br class="">There is an idea (proposal?) to change the representation of inout variables<br class="">in SIL. This is independent of this proposal, but can be helpful for the<br class="">purpose of defining the scope of a COW mutation.<br class=""><br class="">The basic idea is that SILGen inserts scoping instructions for *all* inout<br class="">variables. And those scoping instructions can be used to define the mutating<br class="">scope of a COW buffer.<br class=""><br class="">The scoping instructions which are inserted by SILGen for an inout scope are::<br class=""><br class=""> begin_exclusive<br class=""> end_exclusive<br class=""><br class="">Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those instructions take the<br class="">address of the inout variable as argument. For the optimizer those instructions<br class="">look like potential writes to the inout variable.<br class=""><br class="">The implementor of a COW type has to follow the rule that the COW buffer may<br class="">only be modified in mutating functions of the COW type. But this is the case<br class="">anyway because any modification needs a uniqueness check and this can only be<br class="">done in mutating functions.<br class=""><br class="">Example::<br class=""><br class=""> // > mutating func setSomeData(x: Int) {<br class=""> // Accepts a unique reference to the array value (avoiding refcount operations)<br class=""> sil @setSomeData : $(Int, @inout Array) -> () {<br class=""> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0<br class=""><br class=""> // > makeMutable() (inlined)<br class=""> // Forward the unique reference to the `self` array value, still avoiding refcount operations.<br class=""> // Begin the inlined exclusive scope (could be trivially removed).<br class=""> begin_exclusive %arrayref : $*Array<T> // Begin scope #1<br class=""><br class=""> // > if !isUnique(&self._storage) {<br class=""> // Extract a unique inout reference to the class reference to the array storage.<br class=""> // This begins the isUnique() argument's exclusive scope. The memory is already exclusive<br class=""> // but the scope helps ensure this is the only alias to _storage.<br class=""> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, #Array._storage<br class=""><br class=""> // Uniqueness checking requires an inout reference to the class reference.<br class=""> // The is_unique instruction does not need to create a new storage reference.<br class=""> // It's only purpose is to check the RC count, ensure that the checked reference<br class=""> // is inout, and prevent the inout scope from being optimized away.<br class=""> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T><br class=""><br class=""> // End the isUnique argument's exclusive scope (can also be trivially removed).<br class=""> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T><br class=""><br class=""> br %isuniq, bb_continue, bb_slow<br class=""><br class=""> bb_slow:<br class=""> // > self._storage = copyBuffer(self._storage)<br class=""> // Produce a new class reference to storage with verifiably unique RC semantics.<br class=""> %copied_storage_class = alloc_ref ...<br class=""> // A begin/end exclusive scope is implicit in store [assign].<br class=""> store [assign] %copied_storage_class to %arrayref._storageref<br class=""> br bb_continue<br class=""><br class=""> bb_continue:<br class=""><br class=""> // This marks the end of makeMutable's inout `self` scope. Because Array<br class=""> // contains a "copy_on_write" property, the SIL verifier needs to<br class=""> // prove that %arrayref.#_storage has not escaped at this point. This<br class=""> // is equivalent to checking that %arrayref itself is not copied, and<br class=""> // checking each projection of the "copy_on_write" storage property<br class=""> // (%arrayref._storageref) is not copied. Or, if any copies are present,<br class=""> // they must be consumed within this scope.<br class=""> end_exclusive %arrayref : $*Array<T> // End scope #1<br class=""><br class=""> // > self._storage.someData = x<br class=""> // An _addr instruction with one load/store use doesn't really need its own scope.<br class=""> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage<br class=""><br class=""> // ARC optimization can promote this to a borrow, replacing strong_release with end_borrow.<br class=""> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow ArrayStorage<br class=""> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow ArrayStorage<br class=""><br class=""> // Write some data into the CoW buffer.<br class=""> // (For simplicity, pretend ArrayStorage has a "someData" field).<br class=""> // A single-use _addr instruction, so no scope.<br class=""> %somedata_addr = ref_element_addr %arrayref._storage, #someData<br class=""> // A store with an implicit [exclusive] scope.<br class=""> store [assign] %x to %somedata_addr<br class=""><br class=""> strong_release %arrayref._storage : $*ArrayStorage<T><br class=""><br class=""> // End the isUnique argument's exclusive scope.<br class=""> // The same verification is needed here, but the inner scope would be eliminated.<br class=""> end_exclusive %arrayref : $*Array<T> // End scope #0<br class=""><br class="">In general this approach looks more "user-friendly" than the first<br class="">alternative.<br class=""></blockquote><br class="">Well, I can't really tell, because you haven't shown the Swift code that<br class="">generates this SIL.<br class=""><br class=""><blockquote type="cite" class="">But it depends on implementing the general feature to insert the inout<br class="">scoping instructions. Also, we still have to think through all the<br class="">details of this approach.<br class=""></blockquote><br class="">FWIW, I am convinced we will need (and get) a stricter inout model that<br class="">would be conducive to inserting the scoping instructions.<br class=""><br class=""><br class=""><blockquote type="cite" class="">Dependency between a buffer reference to the scope-begin<br class="">--------------------------------------------------------<br class=""></blockquote><br class="">You can only have a dependency between two things, but as phrased “a<br class="">buffer reference to the scope-begin” sounds like one thing. s/to/and/<br class="">would fix it.<br class=""><br class=""><blockquote type="cite" class="">With both alternatives there is no explicit dependency from a buffer reference<br class="">to a scope-begin instruction::<br class=""><br class=""> %b_cow = load %baddr<br class=""> %b = cow_to_ref %b_cow<br class=""> %x = load %b // No dependency between this...<br class=""> ...<br class=""> begin_exclusive %baddr // ... and this instruction.<br class=""> ...<br class=""><br class="">So in theory the optimizer is free to reschedule the instructions::<br class=""><br class=""> %b_cow = load %baddr<br class=""> %b = cow_to_ref %b_cow<br class=""> ...<br class=""> begin_exclusive %baddr<br class=""> %x = load %b // Wrong! Buffer could be modified here<br class=""> ...<br class=""><br class="">We still have to figure out how to cope with this.<br class=""><br class="">- We could add an end-of-lifetime instruction for a COW buffer reference, which<br class=""> the optimizer may not move over a begin-of-scope instruction.<br class=""><br class="">- Or we just define the implicit rule for the optimizer that any use of a COW<br class=""> reference may not be moved over a begin-of-scope instruction.<br class=""><br class="">Preconditions<br class="">=============<br class=""><br class="">To benefit from COW optimizations in the stdlib Array, Set and Dictionary data<br class="">structures we first need eager bridging, meaning getting rid of the bridged<br class="">buffer. <br class=""></blockquote><br class="">As you know I'm very much in favor of eager bridging, but I don't see<br class="">why this would be dependent on it.<br class=""></div></div></blockquote><div><br class=""></div><div>We could use copy_on_write with eager bridging, but I don’t think it will give any benefits to the optimizer.</div><div>For example, the SIL code to get from an Array to a native ContiguousArrayStorage reference is pretty hard to understand for the optimizer (involves low level bit operations, etc.).</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class="">At least for Array this is implemented as low-level bit operations and<br class="">optimizations cannot reason about it (e.g. finding a reasonable<br class="">RC-root for the buffer reference).<br class=""><br class="">Another thing is that we currently cannot use ``copy_on_write`` for Array<br class="">because of pinning. Array pins it’s buffer when passing an element address to<br class="">an inout parameter. This allows the array buffer to be modified even if its<br class="">reference count is > 1. With ``copy_on_write``, a programmer could break memory<br class="">safety when violating the inout rule. Example::<br class=""><br class=""> var arr = [MyClass()] // a global array<br class=""><br class=""> foo(&arr[0]) // Pins the buffer of arr during the call<br class=""><br class=""> func foo(_ x: inout MyClass) -> Int {<br class=""> let b = arr // The ref-count of the buffer is not incremented, because it is pinned!<br class=""> let r = b[0] // optimizer removes the retain of r because it thinks the following code cannot modify b<br class=""> arr.removeAll() // does not copy the array buffer and thus de-allocates r<br class=""> return r.i // use-after-free!<br class=""> }<br class=""></blockquote><br class="">I only know of one way to resolve inout and pinning:<br class=""><br class="">* Semantically, references are replaced with a trap value when entering<br class=""> an inout context so that all inout values are provably unique<br class=""> references in the absence of unsafe code. We drop pinning and provide<br class=""> explicit operations that provide simultaneous lvalue accesses to<br class=""> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices.<br class=""><br class="">If there are other ideas out there, I'd like to hear them. If not, we<br class="">should probably decide that this is what we're doing so that we can move<br class="">forward without this looming uncertainty.<br class=""><br class="">-- <br class="">-Dave<br class=""><br class="">_______________________________________________<br class="">swift-dev mailing list<br class=""><a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a><br class="">https://lists.swift.org/mailman/listinfo/swift-dev<br class=""></div></div></blockquote></div><br class=""></body></html>