[swift-dev] [Discussion] New refcount representation

Thu Mar 17 11:29:08 CDT 2016

I think maybe we also want to measure how a cmpxchg vs lck;add solution performs under contention.

Objects in read only state (be it cow’ed value types or not) might be shared between threads with the expectation that they are fast, i.e. the argument if retain/release are contented something else is wrong does not apply.

A load/cmpxchg might sent two memory coherence messages (one for shared/exclusive for the load/ one for modified for the cmpxchg) and mispredicted branches (pipeline flush) on state change under contention might exhibit worse performance than a lck;add (one coherence message M, no misprediced branch) depending on how things are implemented under the hood. There is always an opportunity for another level of coherence speculation ….

Other benefits we get might outweigh any such cost though.

> On Mar 16, 2016, at 11:29 PM, Greg Parker via swift-dev <swift-dev at swift.org> wrote:
> 
>> 
>> On Mar 15, 2016, at 11:59 PM, Greg Parker via swift-dev <swift-dev at swift.org> wrote:
>> 
>> I am considering a new representation for Swift refcounts and other per-object data. This is an outline of the scheme. Comments and suggestions welcome.
>> 
>> Today, each object stores 64-bits of refcounts and flags after the isa field.
>> 
>> In this new system, each object would store a pointer-size field after the isa field. This field would have two cases: it could store refcounts and flags, or it could store a pointer to a side allocation that would store refcounts and flags and additional per-object data.
>> 
>> Advantages:
>> * Saves 4 bytes per object on 32-bit for most objects.
>> * Improves refcount overflow and underflow detection.
>> * Might allow an inlineable retain/release fast path in the future.
>> * Allows a new weak reference implementation that doesn't need to keep entire dead objects alive.
>> * Allows inexpensive per-object storage for future features like associated references or class extensions with instance variables.
>> 
>> Disadvantages:
>> * Basic RR operations might be slower on x86_64. This needs to be measured. ARM architectures are probably unchanged.
> 
> I wrote a performance mockup of the fast path. It simply checks the MSB in the appropriate places in RefCount.h but does not actually implement any side allocation. I ran it on some RR-heavy benchmarks (QuickSort, InsertionSort, HeapSort, Array2D) on x86_64 and arm64.
> 
> arm64 is in fact approximately unchanged. Any difference either way is much less than 1%.
> 
> x86_64 is measurably slower:
>   1% QuickSort
>   2% InsertionSort
>   4% Array2D
>   5% HeapSort
> 
> 
> -- 
> Greg Parker     gparker at apple.com     Runtime Wrangler
> 
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev