<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 3 Feb 2017, at 03:57, Andrew Trick via swift-dev <<a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">I'm following up on a resilient dynamic dispatch discussion kicked off by<br class="">Slava during a performance team meeting to summarize some key<br class="">points on public [swift-dev].<br class=""><br class="">It's easy to get sidetracked by the details of dynamic<br class="">dispatch and various ways to generate code. I suggest approaching the<br class="">problem by focusing on the ABI aspects and flexibility the ABI affords<br class="">for future optimization. I'm including a proposal for one specific<br class="">approach (#3) that wasn't discussed yet.<br class=""><br class="">---<br class="">#1. (thunk export) The simplest, most flexible way to expose dispatch<br class="">across resilience boundaries is by exporting a single per-method entry<br class="">point. Future compilers could improve dispatch and gradually expose<br class="">more ABI details.<br class=""><br class="">Cost: We're forced to export all those symbols in perpetuity.<br class=""><br class="">[The cost of the symbols is questionable. The symbol trie should compress the<br class="">names, so the size may be small, and they should be lazily resolved,<br class="">so the startup cost should be amortized].<br class=""><br class="">---<br class="">#2. (offset export) An alternative approach was proposed by JoeG a<br class="">while ago and revisited in the meeting yesterday. It involves a<br class="">client-side vtable offset lookup helper.<br class=""><br class="">This allows more opportunity for micro-optimization on the client<br class="">side. This exposes the isa-based vtable mechanism as ABI. However, it<br class="">stops short of exposing the vtable layout itself. Guaranteeing vtable<br class="">dispatch may become a problem in the future because it forces an<br class="">explosion of metadata. It also has the same problem as #1 because the<br class="">framework must export a per-method symbol for the dispatch<br class="">offset. What's worse, the symbols need to be eagerly resolved (AFAIK).<br class=""><br class="">---<br class="">#3. (method index) This is an alternative that I've alluded to before,<br class="">but was not discussed in yesterday's meeting. One that makes a<br class="">tradeoff between exporting symbols vs. exposing vtable layout. I want<br class="">to focus on direct cost of the ABI support and flexibility of this<br class="">approach vs. approach #1 without arguing over how to micro-optimize<br class="">various dispatching schemes. Here's how it works:<br class=""><br class="">The ABI specifies a sort function for public methods that gives each<br class="">one a per-class index. Version availability takes sort precedence, so<br class="">public methods can be added without affecting other<br class="">indices. [Apparently this is the same approach we're taking with<br class="">witness tables].<br class=""><br class="">As with #2 this avoids locking down the vtable format for now--in the<br class="">future we'll likely optimize it further. To avoid locking all methods<br class="">into the vtable mechanism, the offset can be tagged. The alternative<br class="">dispatch mechanism for tagged offsets will be hidden within the<br class="">class-defining framework.<br class=""><br class="">This avoids the potential explosion of exported symbols--it's limited<br class="">to one per public class. It avoids explosion of metadata by allowing<br class="">alternative dispatch for some subset of methods. These tradeoffs can<br class="">be explored in the future, independent of the ABI.<br class=""><br class="">---<br class="">#3a. (offset table export) A single per-class entry point provides a<br class="">pointer to an offset table. [It can be optionally cached on the client<br class="">side].<br class=""><br class=""> method_index = immediate<br class=""> { // common per-class method lookup<br class=""> isa = load[obj]<br class=""> isa = isa & @isa_mask<br class=""> offset = load[@class_method_table + method_index]<br class=""> if (isVtableOffset(offset))<br class=""> method_entry = load[isa + offset]<br class=""> else<br class=""> method_entry = @resolveMethodAddress(isa, @class_method_table, method_index)<br class=""> }<br class=""> call method_entry<br class=""><br class="">Cost - client code size: Worst case 3 instructions to dispatch vs 1<br class="">instruction for approach #1. Method lookups can be combined, so groups<br class="">of calls will be more compact.<br class=""><br class="">Cost - library size: the offset tables themselves need to be<br class="">materialized on the framework side. I believe this can be done<br class="">statically in read-only memory, but that needs to be verified.<br class=""><br class="">ABI: The offset table format and tag bit are baked into the ABI.<br class=""><br class="">---<br class="">#3b. (lazy resolution) Offset tables can be completely localized.<br class=""><br class=""> method_index = immediate<br class=""> { // common per-class method lookup<br class=""> isa = load[obj]<br class=""> offset = load[@local_class_method_table + method_index]<br class=""> if (!isInitializedOffset(offset)) {<br class=""> offset = @resolveMethodOffset(@class_id, method_index)<br class=""> store [@local_class_method_table + method_index]<br class=""> }<br class=""> if (isVtableOffset(offset))<br class=""> method_entry = load[isa + offset]<br class=""> else<br class=""> method_entry = @resolveMethodAddress(isa, @class_id, method_index)<br class=""> }<br class=""> call method_entry<br class=""><br class="">ABI: This avoids exposing the offset table format as ABI. All that's<br class="">needed is a symbol for the class, a single entry point for method<br class="">offset resolution, and a single entry point for non-vtable method<br class="">resolution.<br class=""><br class="">Benefit: The library no longer needs to statically materialize<br class="">tables. Instead they are initialized lazilly in each client module.<br class=""><br class="">Cost: Lazy initialization of local tables requires an extra check and<br class="">burns some code size.<br class=""><br class="">---<br class="">Caveat:<br class=""><br class="">This is the first time I've thought through approach #3, and it hasn't<br class="">been discussed, so there are likely a few things I'm missing at the<br class="">moment.<br class=""><br class="">---<br class="">Side Note:<br class=""><br class="">Regardless of the resilient dispatch mechanism, within a module the<br class="">dispatch mechanism should be implemented with thunks to avoid type<br class="">checking classes from other files and improve compile time in non-WMO<br class="">builds, as Slava requested.<br class=""><br class="">-Andy<br class="">_______________________________________________<br class="">swift-dev mailing list<br class=""><a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a><br class="">https://lists.swift.org/mailman/listinfo/swift-dev<br class=""></div></div></blockquote></div><br class=""><div class="">I have a question about current dispatching behaviour with protocols and ‘Self’.</div><div class=""><br class=""></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div class=""><font face="Courier" class="">protocol CustomEquatable {</font></div><div class=""><font face="Courier" class=""> func equal(to: Self) -> Bool</font></div><div class=""><font face="Courier" class="">}</font></div><div class=""><font face="Courier" class=""><br class=""></font></div><div class=""><font face="Courier" class="">open class Super : CustomEquatable {</font></div><div class=""><font face="Courier" class=""> func equal(to: Super) -> Bool { print("super"); return false }</font></div><div class=""><font face="Courier" class="">}</font></div><div class=""><font face="Courier" class="">class Sub: Super {</font></div><div class=""><font face="Courier" class=""> func equal(to: Sub) -> Bool { print("sub-sub"); return true }</font></div><div class=""><font face="Courier" class=""> override func equal(to: Super) -> Bool { print("sub-super"); return true }</font></div><div class=""><font face="Courier" class="">}</font></div><div class=""><font face="Courier" class=""><br class=""></font></div><div class=""><font face="Courier" class="">Sub().equal(to: Sub()) // sub-sub</font></div><div class=""><font face="Courier" class="">Sub().equal(to: Super()) // sub-super </font></div><div class=""><font face="Courier" class="">Super().equal(to: Sub()) // super</font></div><div class=""><font face="Courier" class="">Super().equal(to: Super()) // super</font></div><div class=""><font face="Courier" class=""><br class=""></font></div><div class=""><font face="Courier" class="">(Sub() as Super).equal(to: Sub) // sub-super — dynamically dispatched to callee’s type, not param</font></div><div class=""><font face="Courier" class="">(Sub() as Super).equal(to: (Sub() as Super)) // sub-super — as above</font></div></blockquote><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Currently, we dynamically dispatch to the callee’s type to find ‘Self’, but we don’t apply that consistently when dispatching to ‘Self’-type parameters. Is that expected behaviour?</div><div class=""><br class=""></div><div class="">- Karl</div></body></html>