[swift-dev] [Discussion] New refcount representation

Joe Groff jgroff at apple.com
Wed Mar 16 11:25:35 CDT 2016


This sounds awesome. Should we still consider using a non-pointer isa representation on 64 bit platforms? 16 bytes per object header is still kinda big. If we laid out the np-isa bits so that the "side allocation" bit were the MSB, and the strong refcount immediately below it, we could pull the same trick letting the strong refcount overflow into the side allocation bit. Drawbacks would be that we'd need to go back to a global sidetable to reference the overflow allocation, and that on Apple platforms, we'd need to follow whatever non-pointer-isa layout the Objective-C runtime uses, and the exact layout of the bits would have to remain runtime-private and thus not inlineable. If we're running on the assumption that side allocations are rarely needed, then paying for the lock in the rare case might be worth a potentially substantial memory savings. On platforms where we don't need ObjC interop, maybe we could avoid the need for the sidetable with non-pointer-isa by cloning the heap object's vtable into the side allocation and changing the masked isa pointer to refer to the side allocation. That would make the side allocation larger, and would make operations that extract the type metadata from a class for generics a bit more expensive, but would let vtable lookups remain fast.

-Joe

> On Mar 15, 2016, at 11:59 PM, Greg Parker via swift-dev <swift-dev at swift.org> wrote:
> 
> I am considering a new representation for Swift refcounts and other per-object data. This is an outline of the scheme. Comments and suggestions welcome.
> 
> Today, each object stores 64-bits of refcounts and flags after the isa field.
> 
> In this new system, each object would store a pointer-size field after the isa field. This field would have two cases: it could store refcounts and flags, or it could store a pointer to a side allocation that would store refcounts and flags and additional per-object data.
> 
> Advantages:
> * Saves 4 bytes per object on 32-bit for most objects.
> * Improves refcount overflow and underflow detection.
> * Might allow an inlineable retain/release fast path in the future.
> * Allows a new weak reference implementation that doesn't need to keep entire dead objects alive.
> * Allows inexpensive per-object storage for future features like associated references or class extensions with instance variables.
> 
> Disadvantages:
> * Basic RR operations might be slower on x86_64. This needs to be measured. ARM architectures are probably unchanged.
> 
> ----
> 
> The MSB bit would distinguish between the fastest-path in-object retain/release and everything else. Objects that use some other RR path would have that bit set. This would include objects whose refcount is stored in the side allocation and objects whose refcount does not change because they are allocated on the stack or in read-only memory.
> 
> The MSB bit also becomes set if you increment or decrement a retain count too far. That means we can implement the RR fast path with a single conditional branch after the increment or decrement:
> 
> retain:
>    intptr_t oldRC = obj->rc
>    newRC = oldRC + RC_ONE    // sets MSB on overflow; MSB already set for other special cases
>    if (newRC >= 0) {
>        CAS(obj->rc = oldRC => newRC)
>    } else {
>        call slow path
>        // out-of-object refcount     (MSB bits 0b10x)
>        // or refcount has overflowed (MSB bits 0b111)
>        // or refcount is constant    (MSB bits 0b110)
>    }
> 
> release:
>    intptr_t oldRC = obj->rc
>    newRC = oldRC - RC_ONE    // sets MSB on overflow; MSB already set for other special cases
>    if (newRC >= 0) {
>        CAS(obj->rc = oldRC => newRC)
>    } else {
>        call slow path
>        // dealloc                     (MSB bits 0b111)
>        // or out-of-object refcount   (MSB bits 0b10x)
>        // or refcount has underflowed (MSB bits 0b111 and deallocating bit already set)
>        // or refcount is constant     (MSB bits 0b110)
>    }
> 
> There are some fussy bit representation details here to make sure that a pre-existing MSB=1 does not become 0 after an increment or decrement. 
> 
> (In the more distant future this fast path could be inlineable while preserving ABI flexibility: if worse comes to worse we can set the MSB all the time and force inliners to fall back to the slow path runtime function. We don't want to do this yet though.)
> 
> The side allocation could be used for:
> * New weak reference implementation that doesn't need to keep entire dead objects alive.
> * Associated references or class extensions with instance variables
> * Full-size strong refcount and unowned refcount on 32-bit architectures
> * Future concurrency data or debugging instrumentation data
> 
> The Objective-C runtime uses a side table for similar purposes. It has the disadvantage that retrieving an object's side allocation requires use of a global hash table, which is slow and requires locking. This scheme would be faster and contention-free.
> 
> Installing a side allocation on an object would be a one-way operation for thread-safety reasons. For example, an object might be given a side allocation when it is first weakly referenced, but the object would not go back to in-object refcounts if the weak reference went away. Most objects would not need a side allocation.
> 
> ----
> 
> Weak references could be implemented using the side allocation. A weak variable would point to the object's side allocation. The side allocation would store a pointer to the object and a strong refcount and a weak refcount. (This weak refcount would be distinct from the unowned refcount.)  The weak refcount would be incremented once for every weak variable holding this object. 
> 
> The advantage of using a side allocation for weak references is that the storage for a weakly-referenced object could be freed synchronously when deinit completes. Only the small side allocation would remain, backing the weak variables until they are cleared on their next access. This is a memory improvement over today's scheme, which keeps the object's entire storage alive for a potentially long time.
> 
> The hierarchy:
>  Strong refcount goes to zero: deinit
>  Unowned refcount goes to zero: free the object
>  Weak refcount goes to zero: free the side allocation
> 
> When a weakly-referenced object is destroyed, it would free its own storage but leave the side allocation alive until all of the weak references go away. 
> 
> When a weak variable is read, it would go to the side table first and atomically increment the strong refcount if the deallocating bit were not set. Then it would return the object pointer stored in the side allocation. If the deallocating bit was set, it would atomically decrement the weak refcount and free the side allocation if it reaches zero. (There is another race here that probably requires separate side bits for object-is-deallocating and object-is-deallocated.)
> 
> When an old value is erased from a weak variable, it would atomically decrement the weak refcount in the side allocation and free the side allocation if it reaches zero.
> 
> When a new value is stored to a weak variable is written, it would install a side allocation if necessary, then check the deallocating bit in the side allocation. If the object is not deallocating it would atomically increment the weak refcount.
> 
> ----
> 
> RR fast paths in untested x86_64 assembly (AT&T syntax, destination on the right):
> 
> retain_fast:
>   // object in %rdi
>   mov   8(%rdi), %rax
> 1: mov   %rax, %rdx
>   add   $0x200000000, %rdx
>   bmi   retain_slow
>   lock,cmpxchg %rdx, 8(%rdi)
>   bne   1b
> 
> release_fast:
>   // object in %rdi
>   mov   8(%rdi), %rax
> 1: mov   %rax, %rdx
>   sub   $0x200000000, %rdx
>   bmi   release_slow
>   lock,cmpxchg %rdx, 8(%rdi)
>   bne   1b
> 
> 
> RR fast paths in untested arm64 assembly
> 
> retain_fast:
>   // object in x0
>   add   x1, x0, #8
> 1: ldxr  x2, [x1]
>   mov   x3, #0x200000000
>   adds  x2, x2, x3
>   b.mi  retain_slow
>   stxr  w4, x2, [x1]
>   cbz   w4, 1b
> 
> release_fast:
>   // object in x0
>   add   x1, x0, #8
> 1: ldxr  x2, [x1]
>   mov   x3, #0x200000000
>   subs  x2, x2, x3
>   b.mi  release_slow
>   stlxr w4, x2, [x1]
>   cbz   w4, 1b
> 
> 
> -- 
> Greg Parker     gparker at apple.com     Runtime Wrangler
> 
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev



More information about the swift-dev mailing list