[swift-dev] Continuation-based versus nested-function-based materializeForSet

Tue Nov 1 13:00:07 CDT 2016

I like the idea; it makes more sense to me than our current model (which feels more like a plain callback than a continuation to me). Some things that occurred to me when reading this:

- This seems like it'll be much simpler to check for invalid concurrent access to the same location (inout violations) in debug builds. (I don't think it's actually any more or less possible, but it is simpler.)

- This will look funny but work fine for multiple inout parameters: foo(&x, &y)

- OTOH, this does break up basic blocks in the caller context, making LLVM analysis less effective. Since the materializeForSet calls were already either inlined or opaque, though, we're probably fine there.

- Does this help us with the nested dictionary CoW problem? `foo["bar"]["baz"] += 1`

Jordan

> On Oct 31, 2016, at 12:22, Joe Groff via swift-dev <swift-dev at swift.org> wrote:
> 
> We currently abstract over mutable property accesses using what I’ll call a continuation-based model–the materializeForSet accessor is called before an inout access, and returns a continuation callback that must be called when the inout access is completed. I think a nested-function-based model, if designed correctly, has the potential to be more efficient and easier to implement.
> 
> As background for anyone intimately familiar with the implementation, Swift’s language model for mutable properties requires a reconciliation step after a mutation occurs. For example, in an inout access like this:
> 
> foo(&x.y.z)
> the y or z property may be computed, requiring a getter call before the call to foo and a setter call afterward, or either property may have willSet or didSet observers that fire after foo finishes. If the base x is a class, protocol, or generic value, the implementation of y can`t be statically known, so we need an abstract interface for mutating the property, projecting out the value at the start of the access and committing the mutated value back at the end. Ideally, the interface should accommodate efficient access to both stored and computed properties, avoiding unnecessary copying of stored properties to avoid inducing CoW copies or, in the near future, violating the constraints of move-only types or the borrow model. There are two broad approaches to doing this:
> 
> Continuation-based access
> 
> The accessor that begins the formal access can return a continuation closure to call when the access is completed. This is the approach Swift currently uses with its materializeForSet accessor pattern. A property under this model compiles to something like this C code:
> 
> struct X {
>   // getter for Y
>   Y (*getY)(const X *self);
>   // setter for Y
>   void (*setY)(X *self, Y newValue);
>   // materializeForSet for Y
>   struct MaterializeForSetReturn {
>     // The projected field
>     Y *y;
>     // The function that finishes the access, or null
>     void (*finish)(X *self, void *buffer);
>   }; 
>   struct MaterializeForSetReturn (*materalizeForSettingY)(X *self, void *buffer);
> };
> 
> void callFooOnXDotYDotZ(X *x) {
>   // To compile:
>   //   foo(&x.y.z)
>   // 
>   // - call the materializeForSet accessor to begin the access to y, giving it a buffer
>   //   to store a computed value or capture state to pass to the end of the access
>   char alignas(Y) bufferY[sizeof(Y) + 3*sizeof(void*)];
>   auto materializedY = x->materializeForSettingY(x, bufferY);
>   // - call materializeForSet to begin access to z
>   char alignas(Z) bufferZ[sizeof(Z) + 3*sizeof(void*)];
>   autom materializedZ = materializedY.y->materializeForSettingZ(materializedY.y, bufferZ);
>   // - perform the access
>   foo(materializedZ.z);
>   // - finish the accesses, if we were given a finishing callback
>   if (materializedZ.finish)
>     materializedZ.finish(materializedY.y, bufferZ);
>   if (materializedY.finish)
>     materializedY.finish(x, bufferY);
> }
> A stored property can be exposed through this interface by trivially returning the projected address of the subobject, with no continuation:
> 
> struct MaterializeForSetReturn
> materializeForSettingStoredY(X *self, void *buffer) {
>   return (struct MaterializeForSetReturn){&self->y, NULL};
> }
> whereas a computed property can call the getter, store the computed value into the buffer, and return a continuation function that calls the setter:
> 
> struct MaterializeForSetReturn
> materializeForSettingComputedY(X *self, void *buffer) {
>   Y oldValue = getY(self);
>   memcpy(buffer, &oldValue, sizeof(Y));
>   return (struct MaterializeForSetReturn){(Y *)buffer, finishSettingComputedY};
> }
> void finishSettingComputedY(X *self, void *buffer) {
>   Y newValue;
>   memcpy(&newValue, buffer, sizeof(Y));
>   setY(self, newValue);
> }
> A benefit of this approach is that it maintains the integrity of the source function in the compiled code. On the other hand, in order for the property implementation to transfer state from the start to the completion of the access, the caller must preallocate some stack space; if it’s too much, the space is wasted, and if it’s not enough, the property implementation must use malloc to get more space for itself. (It’s theoretically possible to avoid this with some clever stack pointer accounting.) There’s also a code size impact on the caller, which needs to bracket the inout access with a call on each side. The underlying CPU has to execute three or five branches per segment of the access path (calling and returning from materializeForSet, testing for null, and potentially calling and returning from the continuation if there is one).
> 
> Nested functions
> 
> Alternatively, we can implement the formal access pattern as a series of nested functions. A property under this model compiles to something like this C code:
> 
> struct X {
>   // getter for Y
>   Y (*getY)(const X *self);
>   // setter for Y
>   void (*setY)(X *self, Y newValue);
>   // mutator for Y
>   void *(*mutateY)(X *self, void *context, void *(*innerMutation)(Y *y, void *context));
> };
> 
> void callFooOnXDotYDotZ(X *x) {
>   // To compile:
>   //   foo(&x.y)
>   // - call the mutate accessor for y, with the inner access as a
>   //   function whose pointer gets passed in
>   x->mutateY(x, NULL, callFooOnXDotY_inner_1);
> }
> void callFooOnXDotYDotZ_inner_1(Y *y, void *context) {
>   // - call the mutate accessor for z
>   y->mutateZ(y, context, callFooOnXDotY_inner_2);
> }
> void callFooOnXDotYDotZ_inner_2(Z *z, void *context) {
>   foo(z);
> }
> A stored property can implement mutate by projecting the physical address of the subobject, tail-calling the subsequent inner function:
> 
> void *mutateStoredY(X *self, void *context, void *(*innerMutation)(Y *y, void *context)) {
>   /*tail*/ return innerMutation(&self->y, context);
> }
> and a computed property can bracket the inner mutation in getter and setter calls:
> 
> void *mutateComputedY(X *self, void *context, void *(*innerMutation)(Y *y, void *context)) {
>   Y value = getY(self);
>   void *result = innerMutation(&value, context);
>   setY(self, value);
>   return result;
> }
> This makes things easier for the property implementation, which can easily save as much state as it needs to on the stack instead of relying on a preallocated buffer handed down from the caller. There’s one fewer branch necessary per segment of the access path compared to the continuation-based model, either three if the accessor has no reconciliation necessary and can tail-call the inner function, or four if it does a normal call. On the other hand, the caller function has to be broken up into many smaller subfunctions, potentially one for every segment of an access path.
> 
> Letting nested functions walk the access path
> 
> We can reduce the number of nested functions that are needed for a nested access path by designing a convention that allows each mutator to directly call the mutator for the next segment of access. For example, the caller could construct the entire access path as an array of pointers to mutation functions:
> 
> typedef void *(*MutatorFunction)(void *base, MutatorFunction *accessPath, void *context);
> void callFooOnXDotYDotZDotW(X *x) {
>   // foo(&x.y.z.w)
>   MutatorFunction accessPath[] = {
>     (MutatorFunction)mutateZ,
>     (MutatorFunction)mutateW,
>     (MutatorFunction)callFooOnXDotYDotZDotW_inner
>   };
>   mutateY(x, accessPath, NULL);
> }
> Each property accessor function then loads the next step of the access out of the path, and advances the pointer forward for the nested access:
> 
> void *mutateY(X *self, MutatorFunction *accessPath, void *context) {
>   // Project out y
>   Y *y = self->y;
>   // Call the next step of the access path, passing the remainder of the access path
>   return (*accessPath)(y, accessPath + 1, context);
> }
> And the inner function at the end of the access path ignores the access path parameter and performs the access:
> 
> void *callFooOnXDotYDotZDotW_inner(W *w, MutatorFunction *_ignored, void *context) {
>   foo(w);
> }
> In this model, only one inner function needs to be broken out of the caller for the innermost inout access, at least for simple property projections. There are only two branches needed per segment, or one for segments that can tail-call the inner access. Parameterized segments such as subscripts can be supported by storing the parameters in the access path buffer as well, and having the subscript accessor advance past the parameters when it consumes them. For example:
> 
> void callFooOnXDotYSubIDotZ(X *x, intptr_t i) {
>   // foo(&x.y[i].z)
>   uintptr_t accessPath[] = {
>     (uintptr_t)mutateYSubscript,
>     (uintptr_t)i,
>     (uintptr_t)mutateZ,
>     (uintptr_t)callFooOnXDotYSubIDotZ_inner
>   };
>   
>   mutateY(x, accessPath, NULL);
> }
> 
> void *mutateYSubscript(Y *y, uintptr_t *accessPath, void *context) {
>   // Get the subscript index out of the access path
>   intptr_t i = (intptr_t)*accessPath++;
>   // Get the next step of the access out of the access path
>   MutatorFunction next = (MutatorFunction)*accessPath++;
>   
>   return next(&y->elements[i], accessPath, context);
> }
> On the other hand, this does push the inner access functions and subscript arguments onto the stack, which is a small amount of overhead. Furthermore, the caller function has to push any state it needs to pass from outside the access to inside on the stack as well, if it isn’t able to cram it otherwise into a single context pointer argument.
> 
> Machine-level calling convention tricks
> 
> To be able to feed a return value from the inner access back to the outer caller, nested mutators would need to preserve all of the return registers potentially used by the inner access on the way out of the access. To minimize the impact on the outer caller, the convention for mutators could also specify that they need to preserve the contents of callee-save registers when they call down to nested accesses. This should allow as much state to be kept in registers between the outer caller and inner access as if they were naturally in the same function, with at most a function call in between.
> 
> With these sorts of optimizations, I think the nested function approach has the potential to be competitive with, or even more efficient than, our current continuation-based materializeForSet design.
> 
> -Joe
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20161101/ad544b5f/attachment.html>