[swift-dev] Proposal: SIL Ownership Model + Verifier

Michael Gottesman mgottesman at apple.com
Thu Dec 8 16:36:22 CST 2016


> On Dec 8, 2016, at 2:00 PM, John McCall <rjmccall at apple.com> wrote:
> 
>> On Dec 8, 2016, at 1:53 PM, Andrew Trick <atrick at apple.com <mailto:atrick at apple.com>> wrote:
>>> On Dec 7, 2016, at 11:25 PM, John McCall via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>> 
>>>> 
>>>> On Dec 7, 2016, at 2:13 PM, Michael Gottesman via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>> 
>>>> This is a proposal for a new SIL Ownership Model and verifier. An online HTML version of the document is available here:
>>>> 
>>>> https://gottesmm.github.io/proposals/sil-ownership-model.html <https://gottesmm.github.io/proposals/sil-ownership-model.html>
>>>> 
>>>> and inline below.
>>>> 
>>>> Michael
>>>> 
>>>> ----
>>>> 
>>>> # Summary
>>>> 
>>>> This document defines a SIL ownership model and a compile time static verifier
>>>> for the model. This will allow for static compile time verification that a SIL
>>>> program satisfies all ownership model constraints.
>>>> 
>>>> The proposed SIL ownership model embeds ownership into SIL's SSA def-use
>>>> edges. This is accomplished by:
>>>> 
>>>> 1. Formalizing into SIL API's the distinction in between defs
>>>>    (e.g. `SILInstruction` and `SILArgument`) and the values produced by defs
>>>>    (e.g. `SILValue`). This is needed to model values as having ownership and
>>>>    defs as not having ownership. The main implication here is that the two can
>>>>    no longer be implicitly convertible and the API must enforce this.
>> 
>> This proposal is simultaneously proposing some specific API details
>> while avoiding both the motivation behind it and the eventual features
>> that it will support.
>> 
>> I think the ValueBase/Def renaming is a little
>> misunderstanding. "ValueDef" in the proposal was a placeholder (not a
>> perfect name) for some perceived need to have a common
>> SILArgument/SILInstruction base class. That's probably not necessary
>> in which case there's nothing to disagree about.
>> 
>> John's counter proposal solves that problem by repurposing ValueBase
>> to serve as a base class for value definitions (and then goes into
>> some detail on implementing multi-result instructions).
>> 
>>> We currently have two kinds of def: instructions and arguments.  Arguments
>>> always define a single value, and I don't see anything in your proposal that
>>> changes that.  And the idea that an instruction produces exactly one value is
>>> already problematic, because many don't produce a meaningful value at all.
>>> All that changes in your proposal is that certain instructions need to be able
>>> to produce multiple values.
>> 
>> That’s an obvious way of looking at it. I personally am most
>> interested in putting an end to the chronic problem of conflating
>> instructions and values which literally makes it impossible to have a
>> sane discussion about ownership. I insisted that SILInstruction not
>> derive from ValueBase, which probably led to the proposed
>> renaming. This probably comes across as confusing because the proposal
>> is deliberately side-stepping the issue of multi-result instructions.
>> 
>> What I really want to see at the proposal level is agreement on core concepts:
>> - A value has a type, ownership, one def, and 0..n uses
>> - Ownership rules are guaranteed simply by evaluating those 0..n uses.
>> - A value’s def/use is the point at which it is produced/consumed.
>> - Use/def points are demarcated by instructions or block boundaries
>> - An instruction produces values via results or consumes them via operands.
>> 
>> There shouldn't be any debate about these concepts, so why is there so
>> much confusion? I think caused by some bad choices in the SIL
>> interface, particularly as SILInstruction : ValueBase.
>> 
>> So we should define a SIL interface that reflects those core
>> concepts. We should be able to reimplement multi-result instructions
>> without affecting those interfaces and without rewritting SIL passes.
> 
> I agree.  I don't think this requires deep changes to how SIL nodes are typically
> worked with.  As long as single-result instructions can be transparently used as
> values, so that e.g. the result of SILBuilder::CreateBitCast can be used as a
> SILValue, the number of changes required should be quite modest.
> 
> Things like SILCloner will of course need some minor updates to handle the
> fact that a single instruction can have multiple uses, but that should be straightforward
> to do on top of the existing Value->Value mapping.

Just to be clear to the list, what happened here is that I was trying to put some of the initial work for multiple return values and clarifying ownership into this proposal and then I was going to continue with a separate proposal that actually proposed multiple return values and the rest of that API.

John: Lets use this thread to talk about the verifier part of the proposal. I have most of a multiple return value proposal complete that is very similar to what you are talking about. If it would be helpful, I can rip out the first part of this proposal if it would be easier for you to read.

Michael

> 
> John.
> 
>> 
>> -Andy
>> 
>>> 
>>> Moreover, the word "def" clearly suggests that it refers to the definition of a
>>> value that can be used, and that's how the term is employed basically everywhere.
>>> 
>>> So allow me to suggest that a much clearer way of saying what you're trying to
>>> say is that we need to distinguish between defs and instructions.  An instruction
>>> may have an arbitrary number of defs, possibly zero, and each def is a value
>>> that can be used.  (But the number of defs per instruction is known statically
>>> for most instructions, which is something we can use to make working with
>>> defs much less annoying.)
>>> 
>>> Also this stuff you're saying about values having ownership and defs not having
>>> ownership is, let's say, misleading; it only works if you're using a meaninglessly
>>> broad definition of ownership.  It would be better to simply say that the verification
>>> we want to do for ownership strongly encourages us to allow instructions to 
>>> introduce multiple defs whose properties can be verified independently.
>>> 
>>>> 2. Specifying a set of ownership kinds and specifying a method for mapping a
>>>>    `SILValue` to an ownership kind.
>>>> 3. Specifying ownership constraints on all `SILInstruction`s and `SILArgument`s
>>>>    that constrain what ownership kinds their operands and incoming values
>>>>    respectively can possess.
>>>> 4. Implementing a verifier to ensure that all `SILInstruction` and `SILArgument`
>>>>    are compatible with the ownership kind propagated by their operand
>>>>    `SILValue`s and that pseudo-linear dataflow constraints are maintained.
>>> 
>>> I'll tackle these other sections in another email.  Let's go one at a time.
>>> 
>>>> # Cleanly separating Value and Def APIs in SIL
>>>> 
>>>> All values in SIL are defined via an assignment statement of the form: `<foo> = <bar>`.
>>>> In English, we say `foo` is a value that is defined by the def
>>>> `bar`. Originally, these two concepts were distinctly represented by the classes
>>>> `SILValue` and `ValueBase`. All `ValueBase` defined a list of `SILValue` that
>>>> were related, but not equivalent to the defining `ValueBase`. With the decision
>>>> to represent multiple return values as instruction projections instead of as a
>>>> list of `SILValue`, this distinction in between a def and the values was lost,
>>>> resulting in `SILValue` being used interchangeably with `ValueBase` throughout
>>>> the swift codebase. This exposes a modeling issue when one attempts to add
>>>> ownership to SIL, namely that values have ownership, while the defs that define
>>>> the values do not. This implies that defs and values *should not* be
>>>> interchangeable.
>>> 
>>> Again, I do not understand what you're trying to say about ownership here.
>>> Just drop it.
>>> 
>>>> In order to model that values, not defs, have ownership, we separate the
>>>> `SILValue` and `ValueBase` APIs. This is done by:
>>> 
>>> Almost all of this is based on what I consider to be a really unfortunate use
>>> of the term "def".
>>> 
>>> Allow me to suggest this:
>>> 
>>> 1. You do not need to rename ValueBase.  Let's stick with the term "Value"
>>> instead of "Def" in the source code.
>>> 
>>> 2. There are only three subclasses of ValueBase for now:
>>>   - SILArgument
>>>   - SingleInstructionResult
>>>   - MultiInstructionResult
>>> This means that the Kind can probably be packed into the use-list pointer.
>>> 
>>> 3. SingleInstructionResult will only be used for instructions that are known to
>>> produce exactly one value.  All such instructions can derive from a single
>>> class, of which SingleInstructionResult can be a superclass.  This will make
>>> it possible to construct a SILValue directly from such instructions, as well as
>>> making it easy to support dyn_casting back to such instructions.
>>> 
>>> 4. MultiInstructionResult will have to store the result index as well as provide
>>> some way to get back to the instruction.  I think the most efficient representation
>>> here is to store an array of MultiInstructionResults immediately *before* the
>>> address point of the instruction, in reverse order, so that (this + ResultIndex + 1)
>>> gets you back to the instruction.  This also makes it possible to efficiently index
>>> into the results without loading the number of results (except for the
>>> bounds-checking assert, of course).
>>> 
>>> It will not be possible to construct a SILValue directly from one of these instructions;
>>> you'll have to ask the instruction for its Nth result.
>>> 
>>> It's not clear to me what dyn_casting a SILValue back to a multi-result instruction
>>> should do.  I guess it succeeds and then you just ask the instruction which
>>> result you had.
>>> 
>>> 5. SILInstruction has an accessor to return what's basically an ArrayRef of
>>> its instruction results.  It's not quite an ArrayRef because that would require
>>> type-punning, but you can make it work as efficiently as one.
>>> 
>>> John.
>>> 
>>> 
>>>> 
>>>> 1. Renaming `ValueBase` to `ValueDef`. This makes it clear from a naming
>>>> perspective that a `ValueDef` is not a value, but may define values.
>>>> 2. Eliminate all operator methods on `SILValue` that allow one to work with the
>>>> `ValueDef` API via a `SILValue` in favor of an explicit method for getting the
>>>> internal `ValueDef` of a `SILValue`, i.e.:
>>>> 
>>>>         class SILValue {
>>>>           ...
>>>>           ValueDef *operator->() const; // deleted
>>>>           ValueDef &operator*() const; // deleted
>>>>           operator ValueDef *() const; // deleted
>>>> 
>>>>           bool operator==(ValueDef *RHS) const; // deleted
>>>>           bool operator!=(ValueDef *RHS) const; // deleted
>>>>           ...
>>>>           ValueDef *SILValue::getDef() const; // new
>>>>           ...
>>>>         };
>>>> 
>>>> 3. Use private access control and friend classes to only allow for `SILValue`s
>>>>    to be constructed and vended by their defining `ValueDef` via a new method on
>>>>    `ValueDef`:
>>>> 
>>>>         class SILValue {
>>>>           friend class SILUndef; // new
>>>>           friend class SILArgument; // new
>>>>           friend class SILInstruction; // new
>>>>         public:
>>>>           SILValue(ValueDef *); // deleted
>>>>         private:
>>>>           SILValue(ValueDef *); // new
>>>>         };
>>>> 
>>>>         class ValueDef {
>>>>           ...
>>>>           SILValue getValue() const; // new
>>>>           ...
>>>>         };
>>>> 
>>>> To see how specific common code patterns change in light of these changes,
>>>> please see the [appendix](#changes-to-silvalue-api-for-sil-ownership).
>>>> 
>>>> # Values and ValueOwnershipKinds
>>>> 
>>>> Define `ValueOwnershipKind` as the enum with the following cases:
>>>> 
>>>>     enum class ValueOwnershipKind {
>>>>       Trivial,
>>>>       Unowned,
>>>>       Owned,
>>>>       Guaranteed,
>>>>     }
>>>> 
>>>> Our approach to mapping a `SILValue` to a `ValueOwnershipKind` is to use a
>>>> `SILVisitor` called `ValueOwnershipKindVisitor`. This works well since if one
>>>> holds `ValueKind` constant, `SILValue` have well defined ownership
>>>> constraints. Thus we can handle each case individually via the visitor. We use
>>>> SILNodes.def to ensure that all `ValueKind` have a defined visitor method. This
>>>> will ensure that when a new `ValueKind` is added, the compiler will emit a
>>>> warning that the visitor must be updated, ensuring correctness.
>>>> 
>>>> The visitor will be hidden in a *.cpp file and will expose its output via a new
>>>> API on `SILValue` :
>>>> 
>>>>     ValueOwnershipKind SILValue::getOwnershipKind() const;
>>>> 
>>>> Since the implementation of `SILValue::getOwnershipKind()` will be out of line,
>>>> none of the visitor code will be exposed to the rest of the compiler.
>>>> 
>>>> # ValueDefs and Ownership Constraints
>>>> 
>>>> In order to determine if a `SILInstruction`'s operands are SILValue that have
>>>> compatible ownership with a `SILInstruction`, we introduce a new API on
>>>> `SILInstruction` that returns the ownership constraint of the `i`th operand of
>>>> the `SILInstruction`:
>>>> 
>>>>     Optional<ValueOwnershipKind> SILInstruction::getOwnershipConstraint(unsigned i) const;
>>>> 
>>>> Specifically, it returns `.Some(OwnershipKind)` if there is a constraint or
>>>> `.None` if the `SILInstruction` will accept any `ValueOwnershipKind`
>>>> (e.g. `copy_value`). This API is sufficient since there are no `SILInstructions`
>>>> that accept only a subset of `ValueOwnershipKind`, all either accept a singular
>>>> ownership kind or all ownership kinds.
>>>> 
>>>> For `SILArgument`, we determine compatible ownership by introducing the concept
>>>> of ownership conventions. The reason to introduce these ownership conventions is
>>>> that it ensures that we do not need to perform any form of dataflow to determine
>>>> the convention of a `SILArgument` in a loop. Additionally, it allows for greater
>>>> readability in the IR and prevents the optimizer from performing implicit
>>>> conversions of ownership semantics on branch instructions, for instance,
>>>> converting a `switch_enum` from consuming a value at +1 to using a borrowed +0
>>>> parameter. In terms of API, we introduce a new method on `SILArgument`:
>>>> 
>>>>     ValueOwnershipKind SILArgument::getOwnershipConstraint() const;
>>>> 
>>>> which returns the required ownership constraint. In terms of representing these
>>>> ownership constraint conventions in textual SIL, we print out the ownership
>>>> constraint next to the specific argument in the block and the specific argument
>>>> in the given terminator. For details of how this will look with various
>>>> terminators, see the [appendix](#sil-argument-terminator-convention-examples).
>>>> 
>>>> # Verification of Ownership Semantics
>>>> 
>>>> Since our ownership model is based around SSA form, all verification of
>>>> ownership only need consider an individual value (`SILValue`) and the uses of
>>>> the def (`SILInstruction`, `SILArgument`). Thus for each def/use-set, we:
>>>> 
>>>> 1. **Verify Use Semantics**: All uses must be compatible with the def's
>>>>    `ValueOwnershipKind`.
>>>> 2. **Dataflow Verification**: Given any path P in the program and a `ValueDef`
>>>>    `V` along that path:
>>>>    a. There exists only one <a href="#footnote-0-lifetime-ending-use">"lifetime ending"</a> use of `V` along that path.
>>>>    b. After the lifetime ending use of `V`, there are no non-lifetime ending
>>>>       uses of `V` along `P`.
>>>> 
>>>> Since the dataflow verification requires knowledge of a subset of the uses, we
>>>> perform the use semantic verification first and then use the found uses for the
>>>> dataflow verification.
>>>> 
>>>> ## Use Verification
>>>> 
>>>> Just like with `SILValue`, individual `SILInstruction` kind's have well-defined
>>>> ownership semantics implying that we can use a `SILVisitor` approach here as
>>>> well. Thus we define a `SILVisitor` called
>>>> `OwnershipCompatibilityUseChecker`. This checker works by taking in a `SILValue`
>>>> and visiting all of the `SILInstruction` users of the `SILValue`. Each visitor
>>>> method returns a pair of booleans, the first stating whether or not the
>>>> ownership values were compatible and the second stating whether or not this
>>>> specific use should be considered a "lifetime ending" use. The checker then
>>>> stores the lifetime ending uses and non-lifetime ending uses in two separate
>>>> arrays for processing by the dataflow verifier.
>>>> 
>>>> ## Dataflow Verification
>>>> 
>>>> The dataflow verifier takes in as inputs the `SILValue` and lists of
>>>> lifetime-ending and non-lifetime ending uses. Since we are using SSA form, we
>>>> already know that our def must dominate all of our uses implying that a use can
>>>> never overconsume due to a def not being along a path. On the other hand, we
>>>> still must consider issues related to joint post-dominance namely:
>>>> 
>>>> 1. Double-Consume due to 2+ lifetime ending uses along a path. This implies
>>>>    proving that no lifetime ending use is reachable from another lifetime ending
>>>>    use.
>>>> 2. Use-After-Free due to a non-lifetime ending use not being joint-post
>>>>    dominated by the lifetime ending uses. This implies proving that all
>>>>    non-lifetime ending uses are joint post-dominated by the lifetime ending use
>>>>    set.
>>>> 3. Lifetime-Leaks due to the non-lifetime ending uses not joint-post dominating
>>>>    the lifetime ending uses.
>>>> 
>>>> Note how we must consider joint post-dominance in two different places. Thus we
>>>> must be able to prove joint post-dominance quickly and cheaply. To do so, we
>>>> take advantage of dominance implying that all uses are reachable from the def to
>>>> just needing to prove that when we walk the CFG up to prove reachability, that
>>>> any block on the reachability path does not have any successors that have not
>>>> been visited when we finish walking and visit the def.
>>>> 
>>>> The full code is in the [appendix](#full-dataflow-verification-algorithm).
>>>> 
>>>> <a id="footnote-0-lifetime-ending-use">[0]</a>: A use after which a def is no
>>>> longer allowed to be used in any way, e.g. `destroy_value`, `end_borrow`.
>>>> 
>>>> # Appendix
>>>> 
>>>> ## Changes to SILValue API for SIL Ownership
>>>> 
>>>> The common code pattern eliminated by removing implicit uses of `ValueDef`'s API
>>>> via a `SILValue` are as follows:
>>>> 
>>>>     SILValue V;
>>>>     ValueDef *Def;
>>>> 
>>>>     if (V != Def) { ... }
>>>>     if (V->getParentBlock()) { ... }
>>>> 
>>>> Our change, causes these code patterns to be rewritten in favor of the explicit:
>>>> 
>>>>     SILValue V;
>>>>     ValueDef *Def;
>>>> 
>>>>     if (V.getDef() != Def) { ... }
>>>>     if (V.getDef()->getParentBlock()) { ... }
>>>> 
>>>> In cases like the above, we want the to increase clarity through the usage of
>>>> verbosity. In other cases, this verboseness makes convenient APIs more difficult
>>>> to use. The main example here are the `isa` and `dyn_cast` APIs. We introduce
>>>> two helper functions that give the same convenience as before but added clarity by
>>>> making it clear that we are not performing an isa query on the value, but
>>>> instead the underlying def of the value:
>>>> 
>>>>     template <typename ParentTy>
>>>>     bool def_isa(SILValue V) { return isa<ParentTy>(V.getDef()); }
>>>>     template <typename ParentTy>
>>>>     ParentTy *def_dyn_cast(SILValue V) { return dyn_cast<ParentTy>(V.getDef()); }
>>>> 
>>>> Consider the following code using the old API,
>>>> 
>>>>     SILValue V;
>>>> 
>>>>     if (isa<ApplyInst>(V)) { ... }
>>>>     if (auto *PAI = dyn_cast<PartialApplyInst>(V)) { ... }
>>>> 
>>>> Notice how it seems like one is casting the SILValue as if it is a
>>>> ValueBase. This is due to the misleading usage of the implicit conversion from a
>>>> `SILValue` to the `SILValue`'s internal `ValueBase`. In comparison the new API
>>>> makes it absolutely clear that one is reasoning about the ValueKind of the
>>>> underlying def of the `SILValue`:
>>>> 
>>>>     SILValue V;
>>>> 
>>>>     if (def_isa<ApplyInst>(V)) { ... }
>>>>     if (auto *PAI = def_dyn_cast<PartialApplyInst>(V)) { ... }
>>>> 
>>>> Thus the convenience of the old API is maintained, clarity is improved, and the
>>>> conceptual API boundary is enforced.
>>>> 
>>>> The final change that we propose is eliminating the ability to construct a
>>>> `SILValue` from a `ValueDef` externally to the `ValueDef` itself. Allowing this
>>>> to occur violates the modeling notion that a `SILValue` is defined and thus is
>>>> dependent on the `ValueDef`. To implement this, we propose changing the
>>>> constructor `SILValue::SILValue(ValueDef *)` to have private instead of public
>>>> access control and declaring `ValueDef` subclasses as friends of
>>>> `SILValue`. This then allows the `ValueDef` to vend opaquely constructed
>>>> `SILValue`, but disallows external users of the API from directly creating
>>>> `SILValue` from `ValueDef`, enforcing the value/def distinction in our model.
>>>> 
>>>> ## SILArgument/Terminator OwnershipConvention Examples
>>>> 
>>>> We present here several examples of how ownership conventions look on
>>>> SILArguments and terminators.
>>>> 
>>>> First we present a simple branch and return example.
>>>> 
>>>>     class C { ... }
>>>> 
>>>>     sil @simple_branching : $@convention(thin) : @convention(thin) (@owned Builtin.NativeObject, @guaranteed C) -> @owned C {
>>>>     bb0(%0 : @owned $Builtin.NativeObject, %1 : @guaranteed $C):
>>>>       br bb1(%0 : @owned $Builtin.NativeObject)
>>>> 
>>>>     bb1(%1 : @owned $Builtin.NativeObject):
>>>>       strong_release %1 : $Builtin.NativeObject
>>>>       %2 = copy_value %0 : $C
>>>>       return %2 : @owned $C
>>>>     }
>>>> 
>>>> Now consider this loop example:
>>>> 
>>>>     sil @loop : $@convention(thin) : $@convention(thin) (@owned Optional<Builtin.NativeObject>) -> () {
>>>>     bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>>>       br bb1(%0 : @owned $Builtin.NativeObject)
>>>> 
>>>>     bb1(%1 : @owned $Builtin.NativeObject):
>>>>       %2 = alloc_object $C
>>>>       strong_release %1 : $Builtin.NativeObject
>>>>       %3 = unchecked_ref_cast %2 : $C to $Builtin.NativeObject
>>>>       cond_br %cond, bb1(%3 : @owned Builtin.NativeObject), bb2
>>>> 
>>>>     bb2:
>>>>       strong_release %3 : $Builtin.NativeObject
>>>>       %result = tuple()
>>>>       return %result : @trivial $()
>>>>     }
>>>> 
>>>> Note how it is absolutely clear what convention is being used when passing a
>>>> value to the phi node. No dataflow reasoning is required implying we can do a
>>>> simple pass over the CFG to prove correctness.
>>>> 
>>>> Now consider two examples of switches:
>>>> 
>>>>     sil @owned_switch_enum : $@convention(thin) : $@convention(thin) (@owned Optional<Builtin.NativeObject>) -> () {
>>>>     bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>>>       switch_enum %0 : @owned $Builtin.NativeObject, #Optional.none.enumelt: bb1, #Optional.some.enumelt.1: bb2
>>>> 
>>>>     bb1:
>>>>       br bb3
>>>> 
>>>>     bb2(%1 : @owned $Builtin.NativeObject):
>>>>       strong_release %1 : $Builtin.NativeObject
>>>>       br bb3
>>>> 
>>>>     bb3:
>>>>       %result = tuple()
>>>>       return %result : @trivial $()
>>>>     }
>>>> 
>>>>     sil @guaranted_converted_switch_enum : $@convention(thin) : $@convention(thin) (@owned Optional<Builtin.NativeObject>) -> () {
>>>>     bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>>>       %1 = begin_borrow %0 : $Optional<Builtin.NativeObject>
>>>>       switch_enum %1 : @guaranteed $Builtin.NativeObject, #Optional.none.enumelt: bb1, #Optional.some.enumelt.1: bb2
>>>> 
>>>>     bb1:
>>>>       br bb3
>>>> 
>>>>     bb2(%2 : @guaranteed $Builtin.NativeObject):
>>>>       %3 = function_ref @g_user : $@convention(thin) (@guaranteed Builtin.NativeObject) -> ()
>>>>       apply %3(%2) : $@convention(thin) (@guaranteed Builtin.NativeObject) -> ()
>>>>       br bb3
>>>> 
>>>>     bb3:
>>>>       end_borrow %1 from %0 : $Optional<Builtin.NativeObject>, $Optional<Builtin.NativeObject>
>>>>       %result = tuple()
>>>>       return %result : @trivial $()
>>>>     }
>>>> 
>>>> Notice how the lifetime is completely explicit in both cases, so the optimizer
>>>> can not treat the conversion of switch_enum from +1 to +0 implicitly.
>>>> 
>>>> ## Full Dataflow Verification Algorithm
>>>> 
>>>> Define the following book keeping data structures.
>>>> 
>>>>     // The worklist that we will use for our iterative reachability query.
>>>>     llvm::SmallVector<SILBasicBlock *, 32> Worklist;
>>>> 
>>>>     // The set of blocks with lifetime ending uses.
>>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> BlocksWithLifetimeEndingUses;
>>>> 
>>>>     // The set of blocks with non-lifetime ending uses and the associated
>>>>     // non-lifetime ending use SILInstruction.
>>>>     llvm::SmallDenseMap<SILBasicBlock *, SILInstruction *, 8> BlocksWithNonLifetimeEndingUses;
>>>> 
>>>>     // The blocks that we have already visited.
>>>>     llvm::SmallPtrSet<SILBasicBlock *, 32> VisitedBlocks;
>>>> 
>>>>     // A list of successor blocks that we must visit by the time the algorithm terminates.
>>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> SuccessorBlocksThatMustBeVisited;
>>>> 
>>>> Then for each non-lifetime ending use (found by the
>>>> `OwnershipCompatibilityUseChecker`), we add the block and its instruction to the
>>>> `BlocksWithNonLifetimeEndingUses`. There is a possibility of having multiple
>>>> non-lifetime ending uses in the same block, in such a case, we always take the
>>>> later value since we are performing a liveness-esque dataflow:
>>>> 
>>>>     for (SILInstruction *User : getNonLifetimeEndingUses()) {
>>>>       // First try to associate User with User->getParent().
>>>>       auto Result = BlocksWithNonLifetimeEndingUses.insert({User->getParent(), User});
>>>> 
>>>>       // If the insertion succeeds, then we know that there is no more work to
>>>>       // be done, so process the next use.
>>>>       if (Result.second)
>>>>         continue;
>>>> 
>>>>       // If the insertion fails, then we have at least two non-lifetime ending uses
>>>>       // in the same block. Since we are performing a liveness type of dataflow,
>>>>       // we only need the last non-lifetime ending use to show that all lifetime
>>>>       // ending uses post dominate both. Thus, see if Use is after
>>>>       // Result.first->second in the use list. If Use is not later, then we wish
>>>>       // to keep the already mapped value, not use, so continue.
>>>>       if (std::find(Result.first->second->getIterator(), Block->end(), User) ==
>>>>             Block->end()) {
>>>>         continue;
>>>>       }
>>>> 
>>>>       // At this point, we know that Use is later in the Block than
>>>>       // Result.first->second, so store Use instead.
>>>>       Result.first->second = User;
>>>>     }
>>>> 
>>>> Then we visit each each lifetime ending use and its parent block:
>>>> 
>>>>     for (SILInstruction *User : getLifetimeEndingUses()) {
>>>>       SILBasicBlock *UserParentBlock = User->getParent();
>>>>       ...
>>>>     }
>>>> 
>>>> We begin by adding `UserParentBlock` to the `BlocksWithLifetimeEndingUses`
>>>> set. If a block is already in the set (i.e. insert fails), then we know that (1)
>>>> has been violated and we error. If we never hit that condition, then we know
>>>> that all lifetime ending uses are in different blocks.
>>>> 
>>>>     if (!BlocksWithLifetimeEndingUses.insert(UserParentBlock).second) {
>>>>       double_consume_error(); // ERROR!
>>>>     }
>>>> 
>>>> Then we check if we previously saw any non-lifetime ending uses in
>>>> `UserParentBlock` by checking the map `BlocksWithNonLifetimeEndingUses`. If we do
>>>> find any such uses, we check if the lifetime ending use is earlier in the block
>>>> that the non-lifetime ending use. If so then (2) is violated and we
>>>> error. Once we know that (2) has not been violated, we remove
>>>> 
>>>>     auto Iter = BlocksWithNonLifetimeEndUses.find(UserParentBlock);
>>>>     if (Iter != BlocksWithNonLifetimeEndUses.end()) {
>>>>       // Make sure that the non-lifetime ending use is before the lifetime
>>>>       // ending use. Otherwise, we have a use after free.
>>>>       if (std::find(User->getIterator(), UserParentBlock->end(), Iter->second) !=
>>>>             UserParentBlock->end()) {
>>>>         use_after_free_error(); // ERROR!
>>>>       }
>>>> 
>>>>       // Erase the use since we know that it is properly joint post-dominated.
>>>>       BlocksWithNonLifetimeEndingUses.erase(Iter);
>>>>     }
>>>> 
>>>> Then we mark `UserParentBlock` as visited to ensure that we do not consider
>>>> UserParentBlock to be a successor we must visit and prime the worklist with the
>>>> predecessor blocks of `UserParentBlock`.
>>>> 
>>>>     // Then mark UserParentBlock as visited add all predecessors of
>>>>     // UserParentBlock to the worklist.
>>>>     VisitedBlocks.insert(UserParentBlock);
>>>>     copy(UserParentBlock->getPreds(), std::back_inserter(Worklist));
>>>> 
>>>> The entire block of code is provided in the appendix. Now we know that:
>>>> 
>>>> 1. All lifetime ending uses are in different basic blocks.
>>>> 2. All non-lifetime ending uses are either in different basic blocks.
>>>> 
>>>> We begin our worklist algorithm by popping off the next element in the worklist.
>>>> 
>>>>     // Until the worklist is empty.
>>>>     while (!Worklist.empty()) {
>>>>       // Grab the next block to visit.
>>>>       const SILBasicBlock *BB = Worklist.pop_back_val();
>>>>       ...
>>>>     }
>>>> 
>>>> Since every time we insert objects into the worklist, we first try to insert it
>>>> into the `VisitedBlock` list, we do not need to check if `BB` has been visited
>>>> yet, since elements can not be added to the worklist twice.
>>>> 
>>>> Then make sure that BB is not in `BlocksWithLifetimeEndingUses`. If BB is in
>>>> `BlocksWithLifetimeEndingUses`, then we know that there is a path going through
>>>> the def where the def is consumed twice, an error!
>>>> 
>>>>     if (BlocksWithLifetimeEndingUses.count(BB)) {
>>>>       double_consume_error(); // ERROR!
>>>>     }
>>>> 
>>>> Now that we know we are not double consuming, we need to update our data
>>>> structures. First if BB is contained in SuccessorBlocksThatMustBeVisited, we
>>>> remove it. This ensures when the algorithm terminates, we know that the path
>>>> through the successor was visited as apart of our walk.
>>>> 
>>>>     SuccessorBlocksThatMustBeVisited.erase(BB);
>>>> 
>>>> Then if BB is in BlocksWithNonLifetimeEndingUses, we remove it. This ensures
>>>> that when the algorithm terminates, we know that the non-lifetime ending uses
>>>> were properly joint post-dominated by the lifetime ending uses.
>>>> 
>>>>     BlocksWithNonLifetimeEndingUses.erase(BB);
>>>> 
>>>> Then add all successors of BB that we have not visited yet to the
>>>> `SuccessorBlocksThatMustBeVisited` set. This ensures that if we do not visit the
>>>> successor as apart of our CFG walk, at the end of the algorithm we will assert
>>>> that there is a leak.
>>>> 
>>>>     for (const SILBasicBlock *SuccBlock : BB->getSuccessorBlocks()) {
>>>>       // If we already visited the successor, there is nothing to do since
>>>>       // we already visited the successor.
>>>>       if (VisitedBlocks.count(SuccBlock))
>>>>         continue;
>>>> 
>>>>       // Otherwise, add the successor to our MustVisitBlocks set to ensure that
>>>>       // we assert if we do not visit it by the end of the algorithm.
>>>>       SuccessorBlocksThatMustBeVisited.insert(SuccBlock);
>>>>     }
>>>> 
>>>> Finally add all predecessors of BB that we have not visited yet to the Worklist
>>>> for processing. We use insert here to ensure that we only ever add a
>>>> VisitedBlock once to the Worklist:
>>>> 
>>>>     for (const SILBasicBlock *PredBlock : BB->getPredecessorBlocks()) {
>>>>       if (!VisitedBlocks.insert(PredBlock).second)
>>>>         continue;
>>>>       Worklist.push_back(PredBlock);
>>>>     }
>>>> 
>>>> This continues until we have exhausted the worklist. Once the worklist is
>>>> exhausted, we know that:
>>>> 
>>>> 1. If `SuccessorBlocksThatMustBeVisited` is non-empty, then the Blocks in the
>>>>    set are not joint post-dominated by the set of lifetime ending users implying
>>>>    a leak.
>>>> 2. If `BlockSwithNonLifetimeEndingUses` is non-empty, then there was a
>>>>    non-lifetime ending use that was not joint post-dominated by the lifetime
>>>>    ending use set. This implies a use-after free.
>>>> 
>>>> Thus we assert that both sets are empty and error accordingly.
>>>> 
>>>>     if (!SuccessorBlocksThatMustBeVisited.empty()) {
>>>>       leak_error(); // ERROR!
>>>>     }
>>>>     if (!BlocksWithNonLifetimeEndingUses.empty()) {
>>>>       use_after_free_error(); // ERROR!
>>>>     }
>>>> 
>>>> The full code:
>>>> 
>>>>     // The worklist that we will use for our iterative reachability query.
>>>>     llvm::SmallVector<SILBasicBlock *, 32> Worklist;
>>>> 
>>>>     // The set of blocks with lifetime ending uses.
>>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> BlocksWithLifetimeEndingUses;
>>>> 
>>>>     // The set of blocks with non-lifetime ending uses and the associated
>>>>     // non-lifetime ending use SILInstruction.
>>>>     llvm::SmallDenseMap<SILBasicBlock *, SILInstruction *, 8> BlocksWithNonLifetimeEndingUses;
>>>> 
>>>>     // The blocks that we have already visited.
>>>>     llvm::SmallPtrSet<SILBasicBlock *, 32> VisitedBlocks;
>>>> 
>>>>     // A list of successor blocks that we must visit by the time the algorithm terminates.
>>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> SuccessorBlocksThatMustBeVisited;
>>>> 
>>>>     for (SILInstruction *User : getLifetimeEndingUses()) {
>>>>       SILBasicBlock *UserParentBlock = User->getParent();
>>>>       if (!BlocksWithLifetimeEndingUses.insert(UserParentBlock).second) {
>>>>         double_consume_error(); // ERROR!
>>>>       }
>>>> 
>>>>       auto Iter = BlocksWithNonLifetimeEndUses.find(UserParentBlock);
>>>>       if (Iter != BlocksWithNonLifetimeEndUses.end()) {
>>>>         // TODO: Could this be cached.
>>>>         if (std::find(User->getIterator(), UserParentBlock->end(), Iter->second) !=
>>>>               UserParentBlock->end()) {
>>>>           use_after_free_error(); // ERROR!
>>>>         }
>>>>       }
>>>> 
>>>>       // Then add all predecessors of the user block to the worklist.
>>>>       VisitedBlocks.insert(UserParentBlock);
>>>>       copy(UserParentBlock->getPreds(), std::back_inserter(Worklist));
>>>>     }
>>>> 
>>>>     // Until the worklist is empty.
>>>>     while (!Worklist.empty()) {
>>>>       // Grab the next block to visit.
>>>>       const SILBasicBlock *BB = Worklist.pop_back_val();
>>>> 
>>>>       // Then check that BB is not a lifetime-ending use block. If it is error,
>>>>       // since we have an overconsume.
>>>>       if (BlocksWithLifetimeEndingUses.count(BB)) {
>>>>         double_consume_error(); // ERROR!
>>>>       }
>>>> 
>>>>       // Ok, now we know that we are not double-consuming. Update our data
>>>>       // structures.
>>>> 
>>>>       // First remove BB from the SuccessorBlocksThatMustBeVisited list. This
>>>>       // ensures that when the algorithm terminates, we know that BB was not the
>>>>       // beginning of a non-covered path to the exit.
>>>>       SuccessorBlocksThatMustBeVisited.erase(BB);
>>>> 
>>>>       // Then remove BB from BlocksWithNonLifetimeEndingUses so we know that
>>>>       // this block was properly joint post-dominated by our lifetime ending
>>>>       // users.
>>>>       BlocksWithNonLifetimeEndingUses.erase(BB);
>>>> 
>>>>       // Then add all successors of BB that we have not visited yet to the
>>>>       // SuccessorBlocksThatMustBeVisited set.
>>>>       for (const SILBasicBlock *SuccBlock : BB->getSuccessorBlocks()) {
>>>>         // If we already visited the successor, there is nothing to do since
>>>>         // we already visited the successor.
>>>>         if (VisitedBlocks.count(SuccBlock))
>>>>           continue;
>>>> 
>>>>         // Otherwise, add the successor to our MustVisitBlocks set to ensure that
>>>>         // we assert if we do not visit it by the end of the algorithm.
>>>>         SuccessorBlocksThatMustBeVisited.insert(SuccBlock);
>>>>       }
>>>> 
>>>>       // Finally add all predecessors of BB that have not been visited yet to
>>>>       // the worklist.
>>>>       for (const SILBasicBlock *PredBlock : BB->getPredecessorBlocks()) {
>>>>         // Try to insert the PredBlock into the VisitedBlocks list to ensure
>>>>         // that we only ever add a block once to the worklist.
>>>>         if (!VisitedBlocks.insert(PredBlock).second)
>>>>           continue;
>>>>         Worklist.push_back(PredBlock);
>>>>       }
>>>>     }
>>>> 
>>>>     if (!SuccessorBlocksThatMustBeVisited.empty()) {
>>>>       leak_error(); // ERROR!
>>>>     }
>>>>     if (!BlocksWithNonLifetimeEndingUses.empty()) {
>>>>       use_after_free_error(); // ERROR!
>>>>     }
>>>> 
>>>> ----
>>>> 
>>>>> On Dec 5, 2016, at 6:47 PM, Michael Gottesman <mgottesman at apple.com <mailto:mgottesman at apple.com>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Dec 5, 2016, at 5:26 PM, Andrew Trick <atrick at apple.com <mailto:atrick at apple.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Dec 5, 2016, at 3:55 PM, Michael Gottesman <mgottesman at apple.com <mailto:mgottesman at apple.com>> wrote:
>>>>>>> 
>>>>>>> Hello everyone.
>>>>>>> 
>>>>>>> This is an initial draft of a proposal for the SIL Ownership Model + Verifier. I am mainly looking for any high level concerns. If there are none, then I am fine moving this straight to swift-dev.
>>>>>>> 
>>>>>>> It is online here:
>>>>>>> 
>>>>>>> https://gottesmm.github.io/proposals/sil-ownership-model.html <https://gottesmm.github.io/proposals/sil-ownership-model.html>
>>>>>>> 
>>>>>>> Michael
>>>>>> 
>>>>>> 
>>>>>> Thanks! It looks good.
>>>>>> 
>>>>>> > `ValueDef` is not a value, but the def of a value.
>>>>>> 
>>>>>> Should be "but may define (or produce) values.".
>>>>> 
>>>>> Sure.
>>>>> 
>>>>>> 
>>>>>> > Since our ownership model is based around SSA form
>>>>>> 
>>>>>> There's an in-memory aspect as well. Ownership of local memory should be statically verified. We at least need a TBD section for this.
>>>>> 
>>>>> Ok.
>>>>> 
>>>>>> 
>>>>>> > This checker works by taking in a
>>>>>> `ValueDef` and visiting all of the `SILInstruction` users of the
>>>>>> `ValueDef`.
>>>>>> 
>>>>>> This line does not compute for me. A ValueDef does not have ownership. The verifier is simply checking constraints for each SILValue against all of its uses. At the verifier's level of abstraction, it doesn't matter who defined that value.
>>>>> 
>>>>> That is a typo. If you look earlier I was using explicitly SILValue.
>>>>> 
>>>>>> 
>>>>>> > The dataflow verifier takes in as inputs the `ValueDef` (i.e. def) and lists of lifetime-ending and non-lifetime ending uses.
>>>>>> 
>>>>>> Again, the verifier would not care about the uses of the ValueDef, it cares about uses of the SILValue. I think this is a misleading way to formulate the problem and explain the algorithm.
>>>>> 
>>>>> That is a typo from an earlier version of the algorithm. Everything should be about the SILValue.
>>>>> 
>>>>>> 
>>>>>> >  if (V.getDef()->getParentBlock()) { ... }
>>>>>> 
>>>>>> I see no reason we can't add getParentBlock() and other very common idioms to the SILValue API.
>>>>> 
>>>>> The question is how much should we make that a def only API.
>>>>> 
>>>>>> 
>>>>>> > if (auto *PAI = def_dyn_cast<PartialApplyInst>(V)) { ... }
>>>>>> 
>>>>>> It's not obvious to be that this is more readable than 
>>>>>> 
>>>>>> if (auto *PAI = dyn_cast<PartialApplyInst>(V.getDef())) { ... }
>>>>>> 
>>>>>> But I don't have a strong opinion on it.
>>>>> 
>>>>> it is just a little shorter. I am also not wedded to this. I was just trying to create a vision of what is possible to stimulate discussion.
>>>>> 
>>>>>> 
>>>>>> > The final change that we propose is eliminating the ability to construct a
>>>>>> `SILValue` from a `ValueDef` externally to the `ValueDef` itself.
>>>>>> 
>>>>>> Nice!
>>>>>> 
>>>>>> I didn't reread the "Full Dataflow Verification Algorithm" because I don't think it's changed much. I'll review it again after the final proposal.
>>>>> 
>>>>> It is exactly the same as the version I showed you before, except I followed your suggestion about putting the visit before things are added to the worklist rather than after it is popped off the worklist.
>>>>> 
>>>>> Let me fix this real quick.
>>>>> 
>>>>> Michael
>>>>> 
>>>>>> 
>>>>>> -Andy
>>>> 
>>>> 
>>>> _______________________________________________
>>>> swift-dev mailing list
>>>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
>>> 
>>> _______________________________________________
>>> swift-dev mailing list
>>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161208/47d4b51f/attachment-0001.html>


More information about the swift-dev mailing list