[swift-dev] [discussion notes] SIL address types and borrowing

Sat Oct 8 12:56:57 CDT 2016

> On Oct 8, 2016, at 10:09 AM, Karl <razielim at gmail.com> wrote:
> 
> Could you add this (and John’s previous writeup) to the docs in the repo?

Yeah, it’s unfortunate that design discussions are buried in a flood of email. On the flip side, I’ve checked in some premature design docs that are probably nonsense now. I’m currently preparing a type safe memory model design doc to checkin. After that I’ll probably work on a document for SIL SSA with address-only types, which should cover John’s writeup. I’ll have to work with Michael Gottesman and John McCall to get a SIL ownership docs checked in.

> I was reasonably along the way to adding unowned optionals a while back but got totally lost in SILGen.
> This info looks really valuable, but personally I find that with the mailing list format it’s hard to ever find this kind of stuff when I need it.
> 
> Thanks
> 
> Karl
> 
> P.S. going to pick up that unowned optional stuff soon, once I have time to read the docs about SILGen

There are SILGen docs somewhere?

-Andy

> 
>> On 8 Oct 2016, at 08:10, Andrew Trick via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>> 
>> On swift-dev, John already sent out a great writeup on SIL SSA:
>> Representing "address-only" values in SIL.
>> 
>> While talking to John I also picked up a lot of insight into how
>> address types relate to SIL ownership and borrow checking. I finally
>> organized the information into these notes. This is not a
>> proposal. It's background information for those of us writing and
>> reviewing proposals. Just take it as a strawman for future
>> discussions. (There's also a good chance I'm getting something
>> wrong).
>> 
>> [My commentary in brackets.]
>> 
>> ** Recap of address-only.
>> 
>> Divide address-only types into two categories:
>> 1. By abstraction (compiler doesn't know the size).
>> 2. The type is "memory-linked". i.e. the address is significant at runtime.
>>    - weak references (anything that registers its address).
>>    - C++ this.
>>    - Anything with interior pointers.
>>    - Any shared-borrowed value of a type with "nonmutating" properties.
>>      ["nonmutating" properties allow mutation of state attached to a value.
>>       Rust atomics are an example.]
>> 
>> Address-only will not be reflected in SIL types. SIL addresses should
>> only be used for formal memory (pointers, globals, class
>> properties, captures). We'll get to inout arguments later...
>> 
>> As with opaque types, when IRGen lowers a memory-linked borrowed type,
>> it needs to allocate storage.
>> 
>> Concern: SILGen has built-in tracking of managed values that automates
>> insertion of cleanups. Lowering address-only types after SILOpt would
>> require rediscovering that information based on CFG analysis. Is this
>> too heroic?
>> 
>> This was already described by John. Briefly recapping:
>> 
>> e.g. Constructung Optional<Any>
>> 
>> We want initialization should be in-place as such:
>> 
>> %0 = struct_element_addr .. #S.any
>> %1 = init_existential_addr %0, $*Any, $Optional<X>
>> %2 = inject_enum_data_addr %1, $Optional<X>.Some
>> apply @initX(%2)
>> 
>> SILValue initialization would look something like:
>> 
>> %0 = apply @initX()
>> %1 = enum #Optional.Some, %0 : $X
>> %2 = existential %1 : $Any
>> 
>> [I'm not sure we actually want to represent an existential container
>> this way, but enum, yes.]
>> 
>> Lowering now requires discovering the storage structure, bottom-up,
>> hoisting allocation, inserting cleanups as John explained.
>> 
>> Side note: Before lowering, something like alloc_box would directly
>> take its initial value.
>> 
>> ** SILFunction calling convention.
>> 
>> For ownership analysis, there's effectively no difference between the
>> value/address forms of argument ownership:
>> 
>> @owned          / @in
>> @guaranteed     / @in_guaranteed
>> return          / @out
>> @owned arg
>> + @owned return / @inout
>> 
>> Regardless of the representation we choose for @inout, @in/@out will
>> now be scalar types. SILFunction will maintain the distinction between
>> @owned/@in etc. based on whether the type is address-only. We need
>> this for reabstraction, but it only affects the function type, not the
>> calling convention.
>> 
>> Rather than building a tuple, John prefers SIL support for anonymous
>> aggregate as "exploded values".
>> 
>> [I'm guessing because tuples are a distinct formal type with their own
>> convention and common ownership. This may need some discussion though.]
>> 
>> Example SIL function type:
>> 
>> $(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)
>> 
>> %p = apply f: $() -> P
>> %q = apply g: $() -> Q
>> %exploded = apply h(%p, %q)
>> %r = project_exploded %exploded, #0 : $R
>> %s = project_exploded %exploded, #1 : $S
>> %t = project_exploded %exploded, #2 : $T
>> %u = project_exploded %exploded, #3 : $U
>> 
>> Exploded types requires all their elements to be projected with their
>> own independent ownership.
>> 
>> ** Ownership terminology.
>> 
>> Swift "owned"    = Rust values           = SIL @owned      = implicitly consumed
>> Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
>> Swift "inout"    = Rust mutable borrow   = SIL @inout      = unique
>> 
>> Swift "inout" syntax is already (nearly) sufficient.
>> 
>> "borrowed" may not need syntax on the caller side, just a way to
>> qualify parameters. Swift still needs syntax for returning a borrowed
>> value.
>> 
>> ** Representation of borrowed values.
>> 
>> Borrowed values represent some shared storage location.
>> 
>> We want some borrowed value references to be passed as SIL values, not SIL addresses:
>> - Borrowed class references should not be indirected.
>> - Optimize borrowing other small non-memory linked types.
>> - Support capture promotion, and other SSA optimizations.
>> - Borrow CoW values directly.
>> 
>> [Address-only borrowed types will still be passed as SIL addresses (why not?)]
>> 
>> Borrowed types with potentially mutating properties must be passed by
>> SIL address because they are not actually immutable and their storage
>> location is significant.
>> 
>> Borrowed references have a scope and need an end-of-borrow marker.
>> 
>> [The end-of-borrow marker semantically changes the memory state, and
>> statically enforces non-overlapping memory states. It does not
>> semantically write-back a value. Borrowed values with mutating fields
>> are semantically modified in-place.]
>> 
>> [Regardless of whether borrowed references are represented as SIL
>> values or addresses, they must be associated with formal storage. That
>> storage must remain immutable at the language level (although it may
>> have mutating fields) and the value cannot be destroyed during the
>> borrowed scope].
>> 
>> [Trivial borrowed values can be demoted to copies so we can eliminate
>> their scope]
>> 
>> [Anything borrowed from global storage (and not demoted to a copy)
>> needs its scope to be dynamically enforced. Borrows from local storage
>> are sufficiently statically enforced. However, in both cases the
>> optimizer must respect the static scope of the borrow.]
>> 
>> [I think borrowed values are effectively passed @guaranteed. The
>> end-of-borrow scope marker will then always be at the top-level
>> scope. You can't borrow in a caller and end its scope in the callee.]
>> 
>> ** Borrowed and inout scopes.
>> 
>> inout value references are also scoped. We'll get to their
>> representation shortly. Within an inout scope, memory is in an
>> exclusive state. No borrowed scopes may overlap with an inout state,
>> which is to say, memory is either shared or exclusive.
>> 
>> We need a flag for stored properties, even for simple trivial
>> types. That's the only way to provide a simple user model. At least we
>> don't need this to be implemented atomically, we're not detecting race
>> conditions. Optimizations will come later. We should be able to prove
>> that some stored properties are never passed as inout.
>> 
>> The stored property flag needs to be a tri-state: owned, borrowed, exclusive.
>> 
>> The memory value can only be destroyed in the owned state.
>> 
>> The user may mark some storage locations as "unchecked" as an
>> opt-out. That doesn't change the optimizer's constraints. It simply
>> bypasses the runtime check.
>> 
>> ** Ownership of loaded values.
>> 
>> [MikeG already explained possibilities of load ownership in
>> [swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]
>> 
>> For the sake of understanding the model, it's worth realizing that we
>> only need one form of load ownership: load_borrow. We don't
>> actually need an operation that loads an owned value out of formal
>> storage. This makes canonical sense because:
>> 
>> - Semantically, a load must at least be a borrow because the storage
>>   location's non-exclusive flag needs to be dynamically checked
>>   anyway, even if the value will be copied.
>> 
>> - Code motion in the SIL optimizer has to obey the same limitations
>>   within borrow scopes regardless of whether we fuse loads and copies
>>   (retains).
>> 
>> [For the purpose of semantic ARC, the copy_value would be the RC
>> root. The load and copy_value would effectively be "coupled" by the
>> static scope of the borrow. e.g. we would not want to move a release
>> inside the static scope of a borrow.]
>> 
>> [Purely in the interest of concise SIL, I still think we want a load [copy].]
>> 
>> ** SIL value ownership and aggregates
>> 
>> Operations on values:
>> 1. copy
>> 2. forward (move)
>> 3. borrow (share)
>> 
>> A copy or forward produces an owned value.
>> An owned value has a single consumer.
>> A borrow has static scope.
>> 
>> For simplicity, passing a bb argument only has move semantics (it
>> forwards the value). Later that can be expanded if needed.
>> 
>> We want to allow simultaneous access to independent subelements of a
>> fragile aggregate. We should be able to borrow one field while
>> mutating another.
>> 
>> Is it possible to forward a subelement within an aggregate? No. But we
>> can fully explode an owned aggregate into individual owned elements
>> and reconstruct the aggregate. This makes use of the @exploded type
>> feature described in the calling convention.
>> 
>> [I don't think forwarding a subelement is useful anyway except for
>> modeling @inout semantics...]
>> 
>> That leads us to this question: Does an @inout value reference have
>> formal storage (thus a SIL address) or is it just a convention for
>> passing owned SSA values?
>> 
>> ** World 1: SSA @inout
>> 
>> Projecting an element produces a new SILValue. Does this SILValue have
>> it's own ownership associated with it's lifetime, or is it derived
>> from it's parent object by looking through projections?
>> 
>> Either way, projecting any subelement requires reconstructing the
>> entire aggregate in SIL, through all nesting levels. This will
>> generate a massive amount of SILValues. Superficially they all need
>> their own storage.
>> 
>> [We could claim that projections don't need storage, but that only
>> solves one side of the problem.]
>> 
>> [I argue that this actually obscures the producer/consumer
>> relationship, which is the opposite of the intention of moving to
>> SSA. Projecting subelements for mutation fundamentally doesn't make
>> sense. It does make sense to borrow a subelement (not for
>> mutation). It also makes sense to project a mutable storage
>> location. The natural way to project a storage location is by
>> projecting an address...]
>> 
>> ** World 2: @inout formal storage
>> 
>> In this world, @inout references continue to have SILType $*T with
>> guaranteed exclusive access.
>> 
>> Memory state can be:
>> - uninitialized
>> - holds an owned value
>>   - has exclusive access
>>   - has shared access
>> 
>> --- expected transitions need to be handled
>>   - must become uninitialized
>>   - must become initialized
>>   - must preserve initialization state
>> 
>> We need to mark initializers with some "must initialize" marker,
>> similar to how we mark deinitializers [this isn't clear to me yet].
>> 
>> We could give address types qualifiers to distinguish the memory state
>> of their pointee (uninitialized, shared, exclusive). Addresses
>> themselves could be pseudo-linear types. This would provide the same
>> use-def guarantees as the SSA @inout approach, but producing a new
>> address each type memory changes states would also be complicated and
>> cumbersome (though not as bad as SSA).
>> 
>> [[
>> We didn't talk about the alternative, but presumably exclusive
>> vs. shared scope would be delimited by pseudo memory operations as
>> such:
>> 
>> %a1 = alloc_stack
>> 
>> begin_exclusive %a
>> apply foo(%a) // must be marked an initializer?
>> end_exclusive %a
>> 
>> begin_shared %a
>> apply bar(%a) // immutable access
>> end_shared %a
>> 
>> dealloc_stack %a
>> 
>> Values loaded from shared memory also need to be scoped. They must be
>> consumed within the shared region. e.g.
>> 
>> %a2 = ref_element_addr
>> 
>> %x = load_borrow %a2
>> 
>> end_borrow %x, %a2
>> 
>> It makes sense to me that a load_borrow would implicitly transition
>> memory to shared state, and end_borrow would implicitly return memory
>> to an owned state. If the address type is already ($* @borrow T), then
>> memory would remain in the shared state.
>> ]]
>> 
>> For all sorts of analysis and optimization, from borrow checking to
>> CoW to ARC, we really need aliasing guarantees. Knowing we have a
>> unique address to a location is about as good as having an owned
>> value.
>> 
>> To get this guarantee we need to structurally guarantee
>> unique addresses.
>> 
>> [Is there a way to do this with out making all the element_addr
>> operations scoped?]
>> 
>> With aliasing guaratees, verification should be able to statically
>> prove that most formal storage locations are properly initialized and
>> uninitialized (pseudo-linear type) by inspecting the memory
>> operations.
>> 
>> Likewise, we can verify the shared vs. exclusive states.
>> 
>> Representing @inout with addresses doesn't really add features to
>> SIL. In any case, SIL address types are still used for
>> formal storage. Exclusive access through any of the following
>> operations must be guaranteed dynamically:
>> 
>> - ref_element_addr
>> - global_addr
>> - pointer_to_address
>> - alloc_stack
>> - project_box
>> 
>> We end up with these basic SIL Types:
>> 
>> $T = owned value
>> 
>> $@borrowed T = shared value
>> 
>> $*T = exclusively accessed
>> 
>> $* @borrowed T = shared access
>> 
>> [I think the non-address @borrowed type is only valid for concrete
>> types that the compiler knows are not memory-linked? This can be used
>> to avoid passing borrowed values indirectly for arrays and other
>> small, free-to-copy values].
>> 
>> [We obviously need to work through concrete examples before we can
>> claim to have a real design.]
>> 
>> -Andy
>> 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161008/2d178ddd/attachment.html>