[swift-dev] Resilient dynamic dispatch ABI. Notes and mini-proposal.

Fri Feb 3 22:47:37 CST 2017

> On Feb 3, 2017, at 7:12 PM, Joe Groff via swift-dev <swift-dev at swift.org> wrote:
> Given that most open-coded resilient method lookup paths require an extra load dependency to grab the method offset before loading the method address itself, we might possibly consider indirecting the vtables for each class, so that the top-level vtable contains [address of root class vtable, address of first child class vtable, etc.]. If a class hierarchy is fixed once exported (in other words, you can't insert a superclass into an inheritance chain without an ABI break), then the offset into each superclass's pointer in the vtable would be statically known, and the offset into each second-level vtable could be statically known via sorting by availability. This somewhat matches how we lay out protocol witness tables, where each parent protocol's witness table is indirectly referenced instead of being inlined into the leaf witness table. (OTOH, method offsets can be cached and reused to dispatch the same method on different objects, whereas we would have to perform the load chain once per object per method with this approach.)

Great point.

I'm still uncomfortable with the idea of assuming that we can't insert a superclass into an inheritance chain.  This isn't an assumption that's otherwise necessary or even useful, unless we decide to start optimizing dynamic casts.

Assuming it's valid, some additional trade-offs that come to mind:
  - It adds a load dependency to non-resilient dispatch, which is probably what we should be optimizing for.  We have an easy answer when someone asks why their resilient dispatch is a bit slower.  We don't have easy ways to make non-resilient dispatch faster.
  - It still forces us to put every non-final method in the subtable.
  + It would be possible to share subtables between classes, e.g. when a subclass doesn't override anything from a particular parent.
  + It would allow us to put the subtables in constant memory.

John.

> 
> -Joe
> 
>> On Feb 2, 2017, at 6:57 PM, Andrew Trick <atrick at apple.com> wrote:
>> 
>> I'm following up on a resilient dynamic dispatch discussion kicked off by
>> Slava during a performance team meeting to summarize some key
>> points on public [swift-dev].
>> 
>> It's easy to get sidetracked by the details of dynamic
>> dispatch and various ways to generate code. I suggest approaching the
>> problem by focusing on the ABI aspects and flexibility the ABI affords
>> for future optimization. I'm including a proposal for one specific
>> approach (#3) that wasn't discussed yet.
>> 
>> ---
>> #1. (thunk export) The simplest, most flexible way to expose dispatch
>> across resilience boundaries is by exporting a single per-method entry
>> point. Future compilers could improve dispatch and gradually expose
>> more ABI details.
>> 
>> Cost: We're forced to export all those symbols in perpetuity.
>> 
>> [The cost of the symbols is questionable. The symbol trie should compress the
>> names, so the size may be small, and they should be lazily resolved,
>> so the startup cost should be amortized].
>> 
>> ---
>> #2. (offset export) An alternative approach was proposed by JoeG a
>> while ago and revisited in the meeting yesterday. It involves a
>> client-side vtable offset lookup helper.
>> 
>> This allows more opportunity for micro-optimization on the client
>> side. This exposes the isa-based vtable mechanism as ABI. However, it
>> stops short of exposing the vtable layout itself. Guaranteeing vtable
>> dispatch may become a problem in the future because it forces an
>> explosion of metadata. It also has the same problem as #1 because the
>> framework must export a per-method symbol for the dispatch
>> offset. What's worse, the symbols need to be eagerly resolved (AFAIK).
>> 
>> ---
>> #3. (method index) This is an alternative that I've alluded to before,
>> but was not discussed in yesterday's meeting. One that makes a
>> tradeoff between exporting symbols vs. exposing vtable layout. I want
>> to focus on direct cost of the ABI support and flexibility of this
>> approach vs. approach #1 without arguing over how to micro-optimize
>> various dispatching schemes. Here's how it works:
>> 
>> The ABI specifies a sort function for public methods that gives each
>> one a per-class index. Version availability takes sort precedence, so
>> public methods can be added without affecting other
>> indices. [Apparently this is the same approach we're taking with
>> witness tables].
>> 
>> As with #2 this avoids locking down the vtable format for now--in the
>> future we'll likely optimize it further. To avoid locking all methods
>> into the vtable mechanism, the offset can be tagged. The alternative
>> dispatch mechanism for tagged offsets will be hidden within the
>> class-defining framework.
>> 
>> This avoids the potential explosion of exported symbols--it's limited
>> to one per public class. It avoids explosion of metadata by allowing
>> alternative dispatch for some subset of methods. These tradeoffs can
>> be explored in the future, independent of the ABI.
>> 
>> ---
>> #3a. (offset table export) A single per-class entry point provides a
>> pointer to an offset table. [It can be optionally cached on the client
>> side].
>> 
>> method_index = immediate
>> { // common per-class method lookup
>>   isa = load[obj]
>>   isa = isa & @isa_mask
>>   offset = load[@class_method_table + method_index]
>>   if (isVtableOffset(offset))
>>     method_entry = load[isa + offset]
>>   else
>>     method_entry = @resolveMethodAddress(isa, @class_method_table, method_index)
>> }
>> call method_entry
>> 
>> Cost - client code size: Worst case 3 instructions to dispatch vs 1
>> instruction for approach #1. Method lookups can be combined, so groups
>> of calls will be more compact.
>> 
>> Cost - library size: the offset tables themselves need to be
>> materialized on the framework side. I believe this can be done
>> statically in read-only memory, but that needs to be verified.
>> 
>> ABI: The offset table format and tag bit are baked into the ABI.
>> 
>> ---
>> #3b. (lazy resolution) Offset tables can be completely localized.
>> 
>> method_index = immediate
>> { // common per-class method lookup
>>   isa = load[obj]
>>   offset = load[@local_class_method_table + method_index]
>>   if (!isInitializedOffset(offset)) {
>>     offset = @resolveMethodOffset(@class_id, method_index)
>>     store [@local_class_method_table + method_index]
>>   }
>>   if (isVtableOffset(offset))
>>     method_entry = load[isa + offset]
>>   else
>>     method_entry = @resolveMethodAddress(isa, @class_id, method_index)
>> }
>> call method_entry
>> 
>> ABI: This avoids exposing the offset table format as ABI. All that's
>> needed is a symbol for the class, a single entry point for method
>> offset resolution, and a single entry point for non-vtable method
>> resolution.
>> 
>> Benefit: The library no longer needs to statically materialize
>> tables. Instead they are initialized lazilly in each client module.
>> 
>> Cost: Lazy initialization of local tables requires an extra check and
>> burns some code size.
>> 
>> ---
>> Caveat:
>> 
>> This is the first time I've thought through approach #3, and it hasn't
>> been discussed, so there are likely a few things I'm missing at the
>> moment.
>> 
>> ---
>> Side Note:
>> 
>> Regardless of the resilient dispatch mechanism, within a module the
>> dispatch mechanism should be implemented with thunks to avoid type
>> checking classes from other files and improve compile time in non-WMO
>> builds, as Slava requested.
>> 
>> -Andy
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev