[swift-dev] Resilient dynamic dispatch ABI. Notes and mini-proposal.

Mon Feb 6 11:48:01 CST 2017

> On Feb 6, 2017, at 9:02 AM, Greg Parker <gparker at apple.com> wrote:
> 
>> 
>> On Feb 4, 2017, at 2:35 AM, Andrew Trick via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>> 
>> 
>>> On Feb 3, 2017, at 9:37 PM, John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote:
>>> 
>>>>> IV. The function that performs the lookup:
>>>>>  IV1) is parameterized by an isa
>>>>>  IV2) is not parameterized by an isa
>>>>> IV1 allows the same function to be used for super-dispatch but requires extra work to be inlined at the call site (possibly requiring a chain of resolution function calls).
>>>> 
>>>> In my first message I was trying to accomplish IV1. But IV2 is simpler
>>>> and I can't see a fundamental advantage to IV1.
>>> 
>>> Well, you can use IV1 to implement super dispatch (+ sibling dispatch, if we add it)
>>> by passing in the isa of either the superclass or the current class.  IV2 means
>>> that the dispatch function is always based on the isa from the object, so those
>>> dispatch schemes need something else to implement them.
>>> 
>>>> Why would it need a lookup chain?
>>> 
>>> Code size, because you might not want to inline the isa load at every call site.
>>> So, for a normal dispatch, you'd have an IV2 function (defined client-side?)
>>> that just loads the isa and calls the IV1 function (defined by the class).
>> 
>> Right. Looks like I wrote the opposite of what I meant. The important thing to me is that the vtable offset load + check is issued in parallel with the isa load. I was originally pushing IV2 for this reason, but now think that optimization could be entirely lazy via a client-side cache.
> 
> Is this client-side cache per-image or per-callsite? 

Per-image, with up to one cache entry per imported method to hold the vtable offset.
-Andy

>>> So we'd almost certainly want a client-side resolver function that handled
>>> the normal case.  Is that what you mean when you say II1+II2?  So the local
>>> resolver would be I2; II1; III2; IV2; V1, which leaves us with a three-instruction
>>> call sequence, which I think is equivalent to Objective-C, and that function
>>> would do this sequence:
>>> 
>>> define @local_resolveMethodAddress(%object, %method_index)
>>>   %raw_isa = load %object                        // 1 instruction
>>>   %isa_mask = load @swift_isaMask                // 3: 2 to materialize address from GOT (not necessarily with ±1MB), 1 to load from it
>>>   %isa = and %raw_isa, %isa_mask                 // 1
>>>   %cache_table = @local.A.cache_table            // 2: not necessarily within ±1MB
>>>   %cache = add %cache_table, %method_index * 8   // 1
>>>   tailcall @A.resolveMethod(%isa, %method_index, %cache)  // 1
>>> 
>>> John.
>> 
>> Yes, exactly, except we haven’t even done any client-side vtable optimization yet.
>> 
>> To me the point of the local cache is to avoid calling @A.resolveMethod in the common case. So we need another load-compare-and-branch, which makes the local helper 12-13 instructions. Then you have the vtable load itself, so that’s 13-14 instructions. You would be saving on dynamic instructions but paying with 4 extra static instructions per class.
>> 
>> It would be lame if we can't force @local.A.cache_table to be ±1MB relative to the helper.
> 
> You should assume that code and data are far apart from each other. The linker will optimize two-instruction far loads to a nop and a near load if they are in fact close together, but in full-size apps that is uncommon and in the dyld shared cache it never happens. (The shared cache deliberately separates all code from all data in order to save VM map entries.)
> 
> 
> -- 
> Greg Parker     gparker at apple.com <mailto:gparker at apple.com>     Runtime Wrangler

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170206/4b0eb3f7/attachment.html>