[swift-dev] Resilient dynamic dispatch ABI. Notes and mini-proposal.

Thu Feb 2 20:57:05 CST 2017

I'm following up on a resilient dynamic dispatch discussion kicked off by
Slava during a performance team meeting to summarize some key
points on public [swift-dev].

It's easy to get sidetracked by the details of dynamic
dispatch and various ways to generate code. I suggest approaching the
problem by focusing on the ABI aspects and flexibility the ABI affords
for future optimization. I'm including a proposal for one specific
approach (#3) that wasn't discussed yet.

---
#1. (thunk export) The simplest, most flexible way to expose dispatch
across resilience boundaries is by exporting a single per-method entry
point. Future compilers could improve dispatch and gradually expose
more ABI details.

Cost: We're forced to export all those symbols in perpetuity.

[The cost of the symbols is questionable. The symbol trie should compress the
names, so the size may be small, and they should be lazily resolved,
so the startup cost should be amortized].

---
#2. (offset export) An alternative approach was proposed by JoeG a
while ago and revisited in the meeting yesterday. It involves a
client-side vtable offset lookup helper.

This allows more opportunity for micro-optimization on the client
side. This exposes the isa-based vtable mechanism as ABI. However, it
stops short of exposing the vtable layout itself. Guaranteeing vtable
dispatch may become a problem in the future because it forces an
explosion of metadata. It also has the same problem as #1 because the
framework must export a per-method symbol for the dispatch
offset. What's worse, the symbols need to be eagerly resolved (AFAIK).

---
#3. (method index) This is an alternative that I've alluded to before,
but was not discussed in yesterday's meeting. One that makes a
tradeoff between exporting symbols vs. exposing vtable layout. I want
to focus on direct cost of the ABI support and flexibility of this
approach vs. approach #1 without arguing over how to micro-optimize
various dispatching schemes. Here's how it works:

The ABI specifies a sort function for public methods that gives each
one a per-class index. Version availability takes sort precedence, so
public methods can be added without affecting other
indices. [Apparently this is the same approach we're taking with
witness tables].

As with #2 this avoids locking down the vtable format for now--in the
future we'll likely optimize it further. To avoid locking all methods
into the vtable mechanism, the offset can be tagged. The alternative
dispatch mechanism for tagged offsets will be hidden within the
class-defining framework.

This avoids the potential explosion of exported symbols--it's limited
to one per public class. It avoids explosion of metadata by allowing
alternative dispatch for some subset of methods. These tradeoffs can
be explored in the future, independent of the ABI.

---
#3a. (offset table export) A single per-class entry point provides a
pointer to an offset table. [It can be optionally cached on the client
side].

  method_index = immediate
  { // common per-class method lookup
    isa = load[obj]
    isa = isa & @isa_mask
    offset = load[@class_method_table + method_index]
    if (isVtableOffset(offset))
      method_entry = load[isa + offset]
    else
      method_entry = @resolveMethodAddress(isa, @class_method_table, method_index)
  }
  call method_entry

Cost - client code size: Worst case 3 instructions to dispatch vs 1
instruction for approach #1. Method lookups can be combined, so groups
of calls will be more compact.

Cost - library size: the offset tables themselves need to be
materialized on the framework side. I believe this can be done
statically in read-only memory, but that needs to be verified.

ABI: The offset table format and tag bit are baked into the ABI.

---
#3b. (lazy resolution) Offset tables can be completely localized.

  method_index = immediate
  { // common per-class method lookup
    isa = load[obj]
    offset = load[@local_class_method_table + method_index]
    if (!isInitializedOffset(offset)) {
      offset = @resolveMethodOffset(@class_id, method_index)
      store [@local_class_method_table + method_index]
    }
    if (isVtableOffset(offset))
      method_entry = load[isa + offset]
    else
      method_entry = @resolveMethodAddress(isa, @class_id, method_index)
  }
  call method_entry

ABI: This avoids exposing the offset table format as ABI. All that's
needed is a symbol for the class, a single entry point for method
offset resolution, and a single entry point for non-vtable method
resolution.

Benefit: The library no longer needs to statically materialize
tables. Instead they are initialized lazilly in each client module.

Cost: Lazy initialization of local tables requires an extra check and
burns some code size.

---
Caveat:

This is the first time I've thought through approach #3, and it hasn't
been discussed, so there are likely a few things I'm missing at the
moment.

---
Side Note:

Regardless of the resilient dispatch mechanism, within a module the
dispatch mechanism should be implemented with thunks to avoid type
checking classes from other files and improve compile time in non-WMO
builds, as Slava requested.

-Andy