[swift-dev] [discussion notes] SIL address types and borrowing

Sat Oct 8 01:10:52 CDT 2016

On swift-dev, John already sent out a great writeup on SIL SSA:
Representing "address-only" values in SIL.

While talking to John I also picked up a lot of insight into how
address types relate to SIL ownership and borrow checking. I finally
organized the information into these notes. This is not a
proposal. It's background information for those of us writing and
reviewing proposals. Just take it as a strawman for future
discussions. (There's also a good chance I'm getting something
wrong).

[My commentary in brackets.]

** Recap of address-only.

Divide address-only types into two categories:
1. By abstraction (compiler doesn't know the size).
2. The type is "memory-linked". i.e. the address is significant at runtime.
   - weak references (anything that registers its address).
   - C++ this.
   - Anything with interior pointers.
   - Any shared-borrowed value of a type with "nonmutating" properties.
     ["nonmutating" properties allow mutation of state attached to a value.
      Rust atomics are an example.]

Address-only will not be reflected in SIL types. SIL addresses should
only be used for formal memory (pointers, globals, class
properties, captures). We'll get to inout arguments later...

As with opaque types, when IRGen lowers a memory-linked borrowed type,
it needs to allocate storage.

Concern: SILGen has built-in tracking of managed values that automates
insertion of cleanups. Lowering address-only types after SILOpt would
require rediscovering that information based on CFG analysis. Is this
too heroic?

This was already described by John. Briefly recapping:

e.g. Constructung Optional<Any>

We want initialization should be in-place as such:

%0 = struct_element_addr .. #S.any
%1 = init_existential_addr %0, $*Any, $Optional<X>
%2 = inject_enum_data_addr %1, $Optional<X>.Some
apply @initX(%2)

SILValue initialization would look something like:

%0 = apply @initX()
%1 = enum #Optional.Some, %0 : $X
%2 = existential %1 : $Any

[I'm not sure we actually want to represent an existential container
this way, but enum, yes.]

Lowering now requires discovering the storage structure, bottom-up,
hoisting allocation, inserting cleanups as John explained.

Side note: Before lowering, something like alloc_box would directly
take its initial value.

** SILFunction calling convention.

For ownership analysis, there's effectively no difference between the
value/address forms of argument ownership:

@owned          / @in
@guaranteed     / @in_guaranteed
return          / @out
@owned arg
+ @owned return / @inout

Regardless of the representation we choose for @inout, @in/@out will
now be scalar types. SILFunction will maintain the distinction between
@owned/@in etc. based on whether the type is address-only. We need
this for reabstraction, but it only affects the function type, not the
calling convention.

Rather than building a tuple, John prefers SIL support for anonymous
aggregate as "exploded values".

[I'm guessing because tuples are a distinct formal type with their own
convention and common ownership. This may need some discussion though.]

Example SIL function type:

$(@in P, @owned Q) -> (@owned R, @owned S, @out T, @out U)

%p = apply f: $() -> P
%q = apply g: $() -> Q
%exploded = apply h(%p, %q)
%r = project_exploded %exploded, #0 : $R
%s = project_exploded %exploded, #1 : $S
%t = project_exploded %exploded, #2 : $T
%u = project_exploded %exploded, #3 : $U

Exploded types requires all their elements to be projected with their
own independent ownership.

** Ownership terminology.

Swift "owned"    = Rust values           = SIL @owned      = implicitly consumed
Swift "borrowed" = Rust immutable borrow = SIL @guaranteed = shared
Swift "inout"    = Rust mutable borrow   = SIL @inout      = unique

Swift "inout" syntax is already (nearly) sufficient.

"borrowed" may not need syntax on the caller side, just a way to
qualify parameters. Swift still needs syntax for returning a borrowed
value.

** Representation of borrowed values.

Borrowed values represent some shared storage location.

We want some borrowed value references to be passed as SIL values, not SIL addresses:
- Borrowed class references should not be indirected.
- Optimize borrowing other small non-memory linked types.
- Support capture promotion, and other SSA optimizations.
- Borrow CoW values directly.

[Address-only borrowed types will still be passed as SIL addresses (why not?)]

Borrowed types with potentially mutating properties must be passed by
SIL address because they are not actually immutable and their storage
location is significant.

Borrowed references have a scope and need an end-of-borrow marker.

[The end-of-borrow marker semantically changes the memory state, and
statically enforces non-overlapping memory states. It does not
semantically write-back a value. Borrowed values with mutating fields
are semantically modified in-place.]

[Regardless of whether borrowed references are represented as SIL
values or addresses, they must be associated with formal storage. That
storage must remain immutable at the language level (although it may
have mutating fields) and the value cannot be destroyed during the
borrowed scope].

[Trivial borrowed values can be demoted to copies so we can eliminate
their scope]

[Anything borrowed from global storage (and not demoted to a copy)
needs its scope to be dynamically enforced. Borrows from local storage
are sufficiently statically enforced. However, in both cases the
optimizer must respect the static scope of the borrow.]

[I think borrowed values are effectively passed @guaranteed. The
end-of-borrow scope marker will then always be at the top-level
scope. You can't borrow in a caller and end its scope in the callee.]

** Borrowed and inout scopes.

inout value references are also scoped. We'll get to their
representation shortly. Within an inout scope, memory is in an
exclusive state. No borrowed scopes may overlap with an inout state,
which is to say, memory is either shared or exclusive.

We need a flag for stored properties, even for simple trivial
types. That's the only way to provide a simple user model. At least we
don't need this to be implemented atomically, we're not detecting race
conditions. Optimizations will come later. We should be able to prove
that some stored properties are never passed as inout.

The stored property flag needs to be a tri-state: owned, borrowed, exclusive.

The memory value can only be destroyed in the owned state.

The user may mark some storage locations as "unchecked" as an
opt-out. That doesn't change the optimizer's constraints. It simply
bypasses the runtime check.

** Ownership of loaded values.

[MikeG already explained possibilities of load ownership in
[swift-dev] [semantic-arc][proposal] High Level ARC Memory Operations]

For the sake of understanding the model, it's worth realizing that we
only need one form of load ownership: load_borrow. We don't
actually need an operation that loads an owned value out of formal
storage. This makes canonical sense because:

- Semantically, a load must at least be a borrow because the storage
  location's non-exclusive flag needs to be dynamically checked
  anyway, even if the value will be copied.

- Code motion in the SIL optimizer has to obey the same limitations
  within borrow scopes regardless of whether we fuse loads and copies
  (retains).

[For the purpose of semantic ARC, the copy_value would be the RC
root. The load and copy_value would effectively be "coupled" by the
static scope of the borrow. e.g. we would not want to move a release
inside the static scope of a borrow.]

[Purely in the interest of concise SIL, I still think we want a load [copy].]

** SIL value ownership and aggregates

Operations on values:
1. copy
2. forward (move)
3. borrow (share)

A copy or forward produces an owned value.
An owned value has a single consumer.
A borrow has static scope.

For simplicity, passing a bb argument only has move semantics (it
forwards the value). Later that can be expanded if needed.

We want to allow simultaneous access to independent subelements of a
fragile aggregate. We should be able to borrow one field while
mutating another.

Is it possible to forward a subelement within an aggregate? No. But we
can fully explode an owned aggregate into individual owned elements
and reconstruct the aggregate. This makes use of the @exploded type
feature described in the calling convention.

[I don't think forwarding a subelement is useful anyway except for
modeling @inout semantics...]

That leads us to this question: Does an @inout value reference have
formal storage (thus a SIL address) or is it just a convention for
passing owned SSA values?

** World 1: SSA @inout

Projecting an element produces a new SILValue. Does this SILValue have
it's own ownership associated with it's lifetime, or is it derived
from it's parent object by looking through projections?

Either way, projecting any subelement requires reconstructing the
entire aggregate in SIL, through all nesting levels. This will
generate a massive amount of SILValues. Superficially they all need
their own storage.

[We could claim that projections don't need storage, but that only
solves one side of the problem.]

[I argue that this actually obscures the producer/consumer
relationship, which is the opposite of the intention of moving to
SSA. Projecting subelements for mutation fundamentally doesn't make
sense. It does make sense to borrow a subelement (not for
mutation). It also makes sense to project a mutable storage
location. The natural way to project a storage location is by
projecting an address...]

** World 2: @inout formal storage

In this world, @inout references continue to have SILType $*T with
guaranteed exclusive access.

Memory state can be:
- uninitialized
- holds an owned value
  - has exclusive access
  - has shared access

--- expected transitions need to be handled
  - must become uninitialized
  - must become initialized
  - must preserve initialization state

We need to mark initializers with some "must initialize" marker,
similar to how we mark deinitializers [this isn't clear to me yet].

We could give address types qualifiers to distinguish the memory state
of their pointee (uninitialized, shared, exclusive). Addresses
themselves could be pseudo-linear types. This would provide the same
use-def guarantees as the SSA @inout approach, but producing a new
address each type memory changes states would also be complicated and
cumbersome (though not as bad as SSA).

[[
We didn't talk about the alternative, but presumably exclusive
vs. shared scope would be delimited by pseudo memory operations as
such:

%a1 = alloc_stack

begin_exclusive %a
apply foo(%a) // must be marked an initializer?
end_exclusive %a

begin_shared %a
apply bar(%a) // immutable access
end_shared %a

dealloc_stack %a

Values loaded from shared memory also need to be scoped. They must be
consumed within the shared region. e.g.

%a2 = ref_element_addr

%x = load_borrow %a2

end_borrow %x, %a2

It makes sense to me that a load_borrow would implicitly transition
memory to shared state, and end_borrow would implicitly return memory
to an owned state. If the address type is already ($* @borrow T), then
memory would remain in the shared state.
]]

For all sorts of analysis and optimization, from borrow checking to
CoW to ARC, we really need aliasing guarantees. Knowing we have a
unique address to a location is about as good as having an owned
value.

To get this guarantee we need to structurally guarantee
unique addresses.

[Is there a way to do this with out making all the element_addr
operations scoped?]

With aliasing guaratees, verification should be able to statically
prove that most formal storage locations are properly initialized and
uninitialized (pseudo-linear type) by inspecting the memory
operations.

Likewise, we can verify the shared vs. exclusive states.

Representing @inout with addresses doesn't really add features to
SIL. In any case, SIL address types are still used for
formal storage. Exclusive access through any of the following
operations must be guaranteed dynamically:

- ref_element_addr
- global_addr
- pointer_to_address
- alloc_stack
- project_box

We end up with these basic SIL Types:

$T = owned value

$@borrowed T = shared value

$*T = exclusively accessed

$* @borrowed T = shared access

[I think the non-address @borrowed type is only valid for concrete
types that the compiler knows are not memory-linked? This can be used
to avoid passing borrowed values indirectly for arrays and other
small, free-to-copy values].

[We obviously need to work through concrete examples before we can
claim to have a real design.]

-Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161007/2b1af4a9/attachment.html>