[swift-dev] [discussion notes] SIL address types and borrowing

Mon Oct 10 19:35:54 CDT 2016

> On Oct 8, 2016, at 10:09 AM, Karl via swift-dev <swift-dev at swift.org> wrote:
> 
> Could you add this (and John’s previous writeup) to the docs in the repo?
> 
> I was reasonably along the way to adding unowned optionals a while back but got totally lost in SILGen.
> This info looks really valuable, but personally I find that with the mailing list format it’s hard to ever find this kind of stuff when I need it.
> 
> Thanks
> 
> Karl
> 
> P.S. going to pick up that unowned optional stuff soon, once I have time to read the docs about SILGen

I am not sure if it is appropriate to document this sort of thing in the docs directory. This is because, as Andy explicitly mentioned, this document is not an actual proposal or a plan of record. Rather, this is meant to be a record of an in person side discussion that occurred in between two individuals. In the past, when we have had these in person side conversations, notes were not provided to the wider group of developers resulting in siloed knowledge and obscured visibility into the design process.

Eliminating such problems is the intention behind sending out these notes, not providing finalized proposals for placement in the docs directory.

Michael

> 
>> On 8 Oct 2016, at 08:10, Andrew Trick via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>> 
>> On swift-dev, John already sent out a great writeup on SIL SSA:
>> Representing "address-only" values in SIL.
>> 
>> While talking to John I also picked up a lot of insight into how
>> address types relate to SIL ownership and borrow checking. I finally
>> organized the information into these notes. This is not a
>> proposal. It's background information for those of us writing and
>> reviewing proposals. Just take it as a strawman for future
>> discussions. (There's also a good chance I'm getting something
>> wrong).
>> 
>> [My commentary in brackets.]
>> 
>> ** Recap of address-only.
>> 
>> Divide address-only types into two categories:
>> 1. By abstraction (compiler doesn't know the size).
>> 2. The type is "memory-linked". i.e. the address is significant at runtime.
>>    - weak references (anything that registers its address).
>>    - C++ this.
>>    - Anything with interior pointers.
>>    - Any shared-borrowed value of a type with "nonmutating" properties.
>>      ["nonmutating" properties allow mutation of state attached to a value.
>>       Rust atomics are an example.]
>> 
>> Address-only will not be reflected in SIL types. SIL addresses should
>> only be used for formal memory (pointers, globals, class
>> properties, captures). We'll get to inout arguments later...
>> 
>> As with opaque types, when IRGen lowers a memory-linked borrowed type,
>> it needs to allocate storage.
>> 
>> Concern: SILGen has built-in tracking of managed values that automates
>> insertion of cleanups. Lowering address-only types after SILOpt would
>> require rediscovering that information based on CFG analysis. Is this
>> too heroic?
>> 
>> This was already described by John. Briefly recapping:
>> 
>> e.g. Constructung Optional<Any>
>> 
>> We want initialization should be in-place as such:
>> 
>> %0 = struct_element_addr .. #S.any
>> %1 = init_existential_addr %0, $*Any, $Optional<X>
>> %2 = inject_enum_data_addr %1, $Optional<X>.Some
>> apply @initX(%2)
>> 
>> SILValue initialization would look something like:
>> 
>> %0 = apply @initX()
>> %1 = enum #Optional.Some, %0 : $X
>> %2 = existential %1 : $Any
>> 
>> [I'm not sure we actually want to represent an existential container
>> this way, but enum, yes.]
>> 
>> Lowering now requires discovering the storage structure, bottom-up,
>> hoisting allocation, inserting cleanups as John explained.
>> 
>> Side note: Before lowering, something like alloc_box would directly
>> take its initial value.
>> 
>> ** SILFunction calling convention.
>> 
>> For ownership analysis, there's effectively no difference between the
>> value/address forms of argument ownership:
>> 
>> @owned          / @in
>> @guaranteed     / @in_guaranteed
>> return          / @out
>> @owned arg
>> + @owned return / @inout
>> 
>> Regardless of the representation we choose for @inout, @in/@out will
>> now be scalar types. SILFunction will maintain the distinction between
>> @owned/@in etc. based on whether the type is address-only. We need
>> this for reabstraction, but it only affects the function type, not the
>> calling convention.
>> 
>> Rather than building a tuple, John prefers SIL support for anonymous
>> aggregate as "exploded values".
>> 
>> [I'm guessing because tuples are a distinct formal type with their own
>> convention and common ownership. This may need some discussion though.]
>> 
>> Example SIL function type:
>> 
>> $(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)
>> 
>> %p = apply f: $() -> P
>> %q = apply g: $() -> Q
>> %exploded = apply h(%p, %q)
>> %r = project_exploded %exploded, #0 : $R
>> %s = project_exploded %exploded, #1 : $S
>> %t = project_exploded %exploded, #2 : $T
>> %u = project_exploded %exploded, #3 : $U
>> 
>> Exploded types requires all their elements to be projected with their
>> own independent ownership.
>> 
>> ** Ownership terminology.
>> 
>> Swift "owned"    = Rust values           = SIL @owned      = implicitly consumed
>> Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
>> Swift "inout"    = Rust mutable borrow   = SIL @inout      = unique
>> 
>> Swift "inout" syntax is already (nearly) sufficient.
>> 
>> "borrowed" may not need syntax on the caller side, just a way to
>> qualify parameters. Swift still needs syntax for returning a borrowed
>> value.
>> 
>> ** Representation of borrowed values.
>> 
>> Borrowed values represent some shared storage location.
>> 
>> We want some borrowed value references to be passed as SIL values, not SIL addresses:
>> - Borrowed class references should not be indirected.
>> - Optimize borrowing other small non-memory linked types.
>> - Support capture promotion, and other SSA optimizations.
>> - Borrow CoW values directly.
>> 
>> [Address-only borrowed types will still be passed as SIL addresses (why not?)]
>> 
>> Borrowed types with potentially mutating properties must be passed by
>> SIL address because they are not actually immutable and their storage
>> location is significant.
>> 
>> Borrowed references have a scope and need an end-of-borrow marker.
>> 
>> [The end-of-borrow marker semantically changes the memory state, and
>> statically enforces non-overlapping memory states. It does not
>> semantically write-back a value. Borrowed values with mutating fields
>> are semantically modified in-place.]
>> 
>> [Regardless of whether borrowed references are represented as SIL
>> values or addresses, they must be associated with formal storage. That
>> storage must remain immutable at the language level (although it may
>> have mutating fields) and the value cannot be destroyed during the
>> borrowed scope].
>> 
>> [Trivial borrowed values can be demoted to copies so we can eliminate
>> their scope]
>> 
>> [Anything borrowed from global storage (and not demoted to a copy)
>> needs its scope to be dynamically enforced. Borrows from local storage
>> are sufficiently statically enforced. However, in both cases the
>> optimizer must respect the static scope of the borrow.]
>> 
>> [I think borrowed values are effectively passed @guaranteed. The
>> end-of-borrow scope marker will then always be at the top-level
>> scope. You can't borrow in a caller and end its scope in the callee.]
>> 
>> ** Borrowed and inout scopes.
>> 
>> inout value references are also scoped. We'll get to their
>> representation shortly. Within an inout scope, memory is in an
>> exclusive state. No borrowed scopes may overlap with an inout state,
>> which is to say, memory is either shared or exclusive.
>> 
>> We need a flag for stored properties, even for simple trivial
>> types. That's the only way to provide a simple user model. At least we
>> don't need this to be implemented atomically, we're not detecting race
>> conditions. Optimizations will come later. We should be able to prove
>> that some stored properties are never passed as inout.
>> 
>> The stored property flag needs to be a tri-state: owned, borrowed, exclusive.
>> 
>> The memory value can only be destroyed in the owned state.
>> 
>> The user may mark some storage locations as "unchecked" as an
>> opt-out. That doesn't change the optimizer's constraints. It simply
>> bypasses the runtime check.
>> 
>> ** Ownership of loaded values.
>> 
>> [MikeG already explained possibilities of load ownership in
>> [swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]
>> 
>> For the sake of understanding the model, it's worth realizing that we
>> only need one form of load ownership: load_borrow. We don't
>> actually need an operation that loads an owned value out of formal
>> storage. This makes canonical sense because:
>> 
>> - Semantically, a load must at least be a borrow because the storage
>>   location's non-exclusive flag needs to be dynamically checked
>>   anyway, even if the value will be copied.
>> 
>> - Code motion in the SIL optimizer has to obey the same limitations
>>   within borrow scopes regardless of whether we fuse loads and copies
>>   (retains).
>> 
>> [For the purpose of semantic ARC, the copy_value would be the RC
>> root. The load and copy_value would effectively be "coupled" by the
>> static scope of the borrow. e.g. we would not want to move a release
>> inside the static scope of a borrow.]
>> 
>> [Purely in the interest of concise SIL, I still think we want a load [copy].]
>> 
>> ** SIL value ownership and aggregates
>> 
>> Operations on values:
>> 1. copy
>> 2. forward (move)
>> 3. borrow (share)
>> 
>> A copy or forward produces an owned value.
>> An owned value has a single consumer.
>> A borrow has static scope.
>> 
>> For simplicity, passing a bb argument only has move semantics (it
>> forwards the value). Later that can be expanded if needed.
>> 
>> We want to allow simultaneous access to independent subelements of a
>> fragile aggregate. We should be able to borrow one field while
>> mutating another.
>> 
>> Is it possible to forward a subelement within an aggregate? No. But we
>> can fully explode an owned aggregate into individual owned elements
>> and reconstruct the aggregate. This makes use of the @exploded type
>> feature described in the calling convention.
>> 
>> [I don't think forwarding a subelement is useful anyway except for
>> modeling @inout semantics...]
>> 
>> That leads us to this question: Does an @inout value reference have
>> formal storage (thus a SIL address) or is it just a convention for
>> passing owned SSA values?
>> 
>> ** World 1: SSA @inout
>> 
>> Projecting an element produces a new SILValue. Does this SILValue have
>> it's own ownership associated with it's lifetime, or is it derived
>> from it's parent object by looking through projections?
>> 
>> Either way, projecting any subelement requires reconstructing the
>> entire aggregate in SIL, through all nesting levels. This will
>> generate a massive amount of SILValues. Superficially they all need
>> their own storage.
>> 
>> [We could claim that projections don't need storage, but that only
>> solves one side of the problem.]
>> 
>> [I argue that this actually obscures the producer/consumer
>> relationship, which is the opposite of the intention of moving to
>> SSA. Projecting subelements for mutation fundamentally doesn't make
>> sense. It does make sense to borrow a subelement (not for
>> mutation). It also makes sense to project a mutable storage
>> location. The natural way to project a storage location is by
>> projecting an address...]
>> 
>> ** World 2: @inout formal storage
>> 
>> In this world, @inout references continue to have SILType $*T with
>> guaranteed exclusive access.
>> 
>> Memory state can be:
>> - uninitialized
>> - holds an owned value
>>   - has exclusive access
>>   - has shared access
>> 
>> --- expected transitions need to be handled
>>   - must become uninitialized
>>   - must become initialized
>>   - must preserve initialization state
>> 
>> We need to mark initializers with some "must initialize" marker,
>> similar to how we mark deinitializers [this isn't clear to me yet].
>> 
>> We could give address types qualifiers to distinguish the memory state
>> of their pointee (uninitialized, shared, exclusive). Addresses
>> themselves could be pseudo-linear types. This would provide the same
>> use-def guarantees as the SSA @inout approach, but producing a new
>> address each type memory changes states would also be complicated and
>> cumbersome (though not as bad as SSA).
>> 
>> [[
>> We didn't talk about the alternative, but presumably exclusive
>> vs. shared scope would be delimited by pseudo memory operations as
>> such:
>> 
>> %a1 = alloc_stack
>> 
>> begin_exclusive %a
>> apply foo(%a) // must be marked an initializer?
>> end_exclusive %a
>> 
>> begin_shared %a
>> apply bar(%a) // immutable access
>> end_shared %a
>> 
>> dealloc_stack %a
>> 
>> Values loaded from shared memory also need to be scoped. They must be
>> consumed within the shared region. e.g.
>> 
>> %a2 = ref_element_addr
>> 
>> %x = load_borrow %a2
>> 
>> end_borrow %x, %a2
>> 
>> It makes sense to me that a load_borrow would implicitly transition
>> memory to shared state, and end_borrow would implicitly return memory
>> to an owned state. If the address type is already ($* @borrow T), then
>> memory would remain in the shared state.
>> ]]
>> 
>> For all sorts of analysis and optimization, from borrow checking to
>> CoW to ARC, we really need aliasing guarantees. Knowing we have a
>> unique address to a location is about as good as having an owned
>> value.
>> 
>> To get this guarantee we need to structurally guarantee
>> unique addresses.
>> 
>> [Is there a way to do this with out making all the element_addr
>> operations scoped?]
>> 
>> With aliasing guaratees, verification should be able to statically
>> prove that most formal storage locations are properly initialized and
>> uninitialized (pseudo-linear type) by inspecting the memory
>> operations.
>> 
>> Likewise, we can verify the shared vs. exclusive states.
>> 
>> Representing @inout with addresses doesn't really add features to
>> SIL. In any case, SIL address types are still used for
>> formal storage. Exclusive access through any of the following
>> operations must be guaranteed dynamically:
>> 
>> - ref_element_addr
>> - global_addr
>> - pointer_to_address
>> - alloc_stack
>> - project_box
>> 
>> We end up with these basic SIL Types:
>> 
>> $T = owned value
>> 
>> $@borrowed T = shared value
>> 
>> $*T = exclusively accessed
>> 
>> $* @borrowed T = shared access
>> 
>> [I think the non-address @borrowed type is only valid for concrete
>> types that the compiler knows are not memory-linked? This can be used
>> to avoid passing borrowed values indirectly for arrays and other
>> small, free-to-copy values].
>> 
>> [We obviously need to work through concrete examples before we can
>> claim to have a real design.]
>> 
>> -Andy
>> 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161010/52ee77fd/attachment.html>