[swift-dev] Reconsidering the global uniqueness of type metadata and protocol conformance instances
John McCall
rjmccall at apple.com
Mon Jul 31 17:04:50 CDT 2017
> On Jul 31, 2017, at 12:13 PM, Joe Groff <jgroff at apple.com> wrote:
>
>
>> On Jul 30, 2017, at 7:55 PM, John McCall via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>
>>> On Jul 30, 2017, at 9:08 PM, Slava Pestov <spestov at apple.com <mailto:spestov at apple.com>> wrote:
>>>> On Jul 30, 2017, at 5:47 PM, John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote:
>>>>> On Jul 29, 2017, at 7:35 PM, Slava Pestov <spestov at apple.com <mailto:spestov at apple.com>> wrote:
>>>>>> On Jul 29, 2017, at 12:53 PM, John McCall via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>> On Jul 29, 2017, at 12:48 AM, Andrew Trick <atrick at apple.com <mailto:atrick at apple.com>> wrote:
>>>>>>>> On Jul 28, 2017, at 8:13 PM, John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote:
>>>>>>>>> On Jul 28, 2017, at 11:11 PM, John McCall via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>>>> On Jul 28, 2017, at 10:38 PM, Andrew Trick <atrick at apple.com <mailto:atrick at apple.com>> wrote:
>>>>>>>>>>> On Jul 28, 2017, at 3:15 PM, John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote:
>>>>>>>>>>>> On Jul 28, 2017, at 6:02 PM, Andrew Trick via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 28, 2017, at 2:20 PM, Joe Groff via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The Swift runtime currently maintains globally unique pointer identities for type metadata and protocol conformances. This makes checking type equivalence a trivial pointer equality comparison, but most operations on generic values do not really care about exact type identity and only need to invoke value or protocol witness methods or consult other data in the type metadata structure. I think it's worth reevaluating whether having globally unique type metadata objects is the correct design choice. Maintaining global uniqueness of metadata instances carries a number of costs. Any code that wants type metadata for an instance of a generic type, even a fully concrete one, must make a potentially expensive runtime call to get the canonical metadata instance. This also greatly complicates our ability to emit specializations of type metadata, value witness tables, or protocol witness tables for concrete instances of generic types, since specializations would need to be registered with the runtime as canonical metadata objects, and it would be difficult to do this lazily and still reliably favor specializations over more generic witnesses. The lack of witness table specializations leaves an obnoxious performance cliff for instances of generic types that end up inside existential containers or cross into unspecialized code. The runtime also obligates binaries to provide the canonical metadata for all of their public types, along with all the dependent value witnesses, class methods, and protocol witness tables, meaning a type abstraction can never be completely "zero-cost" across modules.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the other hand, if type metadata did not need to be unique, then the compiler would be free to emit specialized type metadata and protocol witness tables for fully concrete non-concrete value types without consulting the runtime. This would let us avoid runtime calls to fetch metadata in specialized code, and would make it much easier for us to implement witness specialization. It would also give us the ability to potentially extend the "inlinable" concept to public fragile types, making it a client's responsibility to emit metadata for the type when needed and keeping the type from affecting its home module's ABI. This could significantly reduce the size and ABI surface area of the standard library, since the standard library contains a lot of generic lightweight adapter types for collections and other abstractions that are intended to be optimized away in most use cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are of course benefits to globally unique metadata objects that we would lose if we gave up uniqueness. Operations that do check type identity, such as comparison, hashing, and dynamic casting, would have to perform more expensive checks, and nonunique metadata objects would need to carry additional information to enable those checks. It is likely that class objects would have to remain globally unique, if for no other reason than that the Objective-C runtime requires it on Apple platforms. Having multiple equivalent copies of type metadata has the potential to increase the working set of an app in some situations, although it's likely that redundant compiler-emitted copies of value type metadata would at least be able to live in constant pages mapped from disk instead of getting dynamically instantiated by the runtime like everything is today. There could also be subtle source-breaking behavior for code that bitcasts metatype values to integers or pointers and expects bit-level equality to indicate type equality. It's unlikely to me that giving up uniqueness would buy us any simplification to the runtime, since the runtime would still need to be able to instantiate metadata for unspecialized code, and we would still want to unique runtime-instantiated metadata objects as an optimization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Overall, my intuition is that the tradeoffs come out in favor for nonunique metadata objects, but what do you all think? Is there anything I'm missing?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Joe
>>>>>>>>>>>>
>>>>>>>>>>>> In a premature proposal two years ago, we agreed to ditch unique protocol conformances but install the canonical address as the first entry in each specialized table.
>>>>>>>>>>>
>>>>>>>>>>> This would be a reference to (unique) global data about the conformance, not a reference to some canonical version of the protocol witness table. We do not rely on having a canonical protocol witness table. The only reason we unique them (when we do need to instantiate) is because we don't want to track their lifetimes.
>>>>>>>>>>>
>>>>>>>>>>>> That would mitigate the disadvantages that you pointed to. But, we would also lose the ability to emit specialized metadata/conformances in constant pages. How do you feel about that tradeoff?
>>>>>>>>>>>
>>>>>>>>>>> Note that, per above, it's only specialized constant type metadata that we would lose.
>>>>>>>>>>>
>>>>>>>>>>> I continue to feel that having to do structural equality tests on type metadata would be a huge loss.
>>>>>>>>>>>
>>>>>>>>>>> John.
>>>>>>>>>>
>>>>>>>>>> My question was really, are we going to runtime-initialize the specialized metadata and specialized witness tables in order to install the unique identifier, rather than requiring a runtime call whenever we need the unique ID. I think the answer is “yes”, we want to install the ID at initialization time for fast type comparison, hashing and casting.
>>>>>>>>>
>>>>>>>>> Sorry, by "(unique) global data about the conformance" I meant that we would emit a global conformance descriptor in constant data for the conformance declaration. There would be one of these, no matter how many it was instantiated; it would therefore uniquely identify a possible generic conformance the same way that a nominal type descriptor uniquely identifies a possibly generic type. The reference to it would just be an ordinary symbol reference.
>>>>>>>>
>>>>>>>> Naturally, eagerly emitting one of those has the same advantages and disadvantages as eagerly emitting type metadata and everything else, and can be solved in the same way.
>>>>>>>>
>>>>>>>> John.
>>>>>>>
>>>>>>> Sure, for witness tables each constant specialized conformance can refer to a unique constant nominal conformance, resolved at link-time.
>>>>>>>
>>>>>>> Whereas we expect specialized type metadata to always need some runtime initialization because we want to unique some canonical entity for each instantiation and possibly compress VWTs.
>>>>>>
>>>>>> Oh, I missed that you were talking about both, sorry. If we wanted to emit specialized type metadata, I think it would have to be an explicit goal that they could be emitted without any sort of dynamic initialization, which implies that they're non-unique.
>>>>>
>>>>> I was wondering about that. I’m still having trouble filling in the details, but it seems that if non-unique type metadata never ‘escapes’ from a function, we could stack-allocate ‘structural’ metadata, for example if you have
>>>>>
>>>>> func foo<T>(_: T) {}
>>>>>
>>>>> func bar<T>(x: T, y: Y) {
>>>>> foo((x, y))
>>>>> }
>>>>>
>>>>> You would be able to compile bar() without any runtime calls at all, building the tuple type metadata ‘from scratch’ on the stack and passing it to foo(). Perhaps generic nominal types could also be constructed non-uniquely without a runtime call.
>>>>
>>>> Okay. What would be required to prove that type metadata never escapes from a function?
>>>
>>> Well, we could say that metadata is uniqued before being reified into a value (T.self) or when constructing an existential, etc. Other than that, I think the only thing we do with metadata is pass it to other functions?
>>
>> Hmm, yes, I guess we could make sure that everything that uses metadata in any way that might escape just uniques it at that point.
>>
>> Your tuple example is interesting because it would actually be quite elaborate to construct on the fly every time we needed it, since we'd have to perform type layout dynamically and form a complete value witness table. I hope you're not anticipating inlining that into every construction site?
>
> Being able to reclaim memory for dynamically-generated type metadata in at least some situations feels compelling to me, since our current design always "leaks" metadata memory and could probably be induced to pathologically waste memory if an attacker put their mind to it.
Supposing that there is such an attack, it seems unlikely to me that it couldn't be made to involve a class or an existential or some other thing that forced heap-allocation.
> It's conceivable that we could provide entry points for dynamically generating temporary metadata into caller-provided stack space, or if LLVM theoretically had alloca-into-caller, have runtime entry points that do the stack allocation and temporary metadata instantiation. The analysis to balance the tradeoff between regenerating a metadata record multiple times vs. creating and caching it once is probably nontrivial for the compiler to figure out ahead of time, though.
Profile-guided, maybe?
But I think I've made my point that this would be a huge research project that I can't imagine us finding the time for in the next year. Forward declarations are already going to represent a huge revision to the metadata system, and one that is substantially more urgent to solve for ABI stability.
John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170731/e5ca8d6c/attachment.html>
More information about the swift-dev
mailing list