[swift-corelibs-dev] libdispatch roadmap and api addition proposal

Pierre Habouzit phabouzit at apple.com
Mon Dec 7 16:10:09 CST 2015


> On Dec 7, 2015, at 12:30 PM, Kevin Ballard via swift-corelibs-dev <swift-corelibs-dev at swift.org> wrote:
> 
> On Mon, Dec 7, 2015, at 04:55 AM, Joakim Hassila via swift-corelibs-dev wrote:
> Secondly, we have extended the public libdispatch API internally with one
> more flavor of dispatching, let’s call it ‘dispatch_async_inline’ - the
> semantics being ‘perform the processing of the work synchronously if we
> wouldn’t block the calling thread, if we would block, instead perform the
> work as a normal dispatch_async’.
>  
> Would such a change be considered to be integrated, or should we keep our
> internal diffs indefinitely? Just to understand if it is worth the effort
> with a nicely packaged pull request or not...
>  
> The rationale for the API is that we are quite latency sensitive and want
> to use inline processing up until the point where we can’t keep up with
> the available work, at which point we would switch to asynchronous
> processing seamlessly (we have multiple producers). This means that the
> thread calling this API can be stolen for a significant amount of time
> (emptying the queue it was assigned to), but when the system is under
> ‘light' load, we don’t need to incur the wakeup penalty for a completely
> asynchronous dispatch.
>  
> I actually have an outstanding radar asking for this exact functionality. My proposal called it `dispatch_try_sync()`, which didn't actually call the dispatch_async() automatically but simply returned a boolean value telling you if it ran the code. My use-case here wasn't actually that I wanted to run the code async, but that I needed to do two operations on a realtime thread in any order, one of which needed to be on a queue, so I wanted to do something like
>  
> BOOL done = dispatch_try_sync(queue, ^{ ... });
> do_other_work();
> if (!done) {
>     dispatch_sync(queue, ^{ ... });
> }
>  
> My radar is still open (rdar://problem/16436943 <rdar://problem/16436943>), but it got a response as follows:
>  
> I think the best way to "emulate" this is to use a DATA_OR source, and not semaphores or other things like that.
>  
> Most of the issues that I've seen with trylock() tends to be uses looking like this:
>  
> again:
>   if (trylock()) {
>     do {
>       clear_marker();
>       do_job();
>      } while(has_marker());
>      unlock();
>   } else if (!has_marker()) {
>     set_marker();
>     goto again;
>   }
>  
> and all unlockers check for the marker to do the said job before unlock basically.
>  
> The thing is, most of the people use that wrongly and don't loop properly making those coalescing checks racy, that's what dispatch DATA_OR sources are for.
>  
> Many other uses can also be replaced with a dispatch_async()
>  
> and it's very clear that the reporter can do exactly what he wants with a DATA_OR source. We should have a way to make sources acts as barriers (which I have a patch for) else we only provide half the required primitives.
>  
> I don't see a compelling use case that can't be solved elegantly with data sources today.
>  
> Using a DISPATCH_SOURCE_DATA_OR with a latch is a good alternative to what you are doing.
>  
> We are continuing to work on this issue, and will follow up with you again.

Hi Joakim, Kevin,

[ Full disclosure, I made that reply in rdar://problem/16436943 <rdar://problem/16436943> and your use case was slightly different IIRC but you’re right it’s a close enough problem ]

Dispatch internally has a notion of something that does almost that, called _dispatch_barrier_trysync_f[1]. However, it is used internally to serialize state changes on sources and queues such as setting the target queue or event handlers.

The problem is that this call bypasses the target queue hierarchy in its fastpath, which while it’s correct when changing the state of a given source or queue, is generally the wrong thing to do. Let’s consider this code assuming the dispatch_barrier_trysync()


    dispatch_queue_t outer = dispatch_queue_create("outer", NULL);
    dispatch_queue_t inner = dispatch_queue_create("inner", NULL);
    dispatch_set_target_queue(outer, inner);

    dispatch_async(inner, ^{
        // write global state protected by inner
    });
    dispatch_barrier_trysync(outer, ^{
        // write global state protected by inner
    });


Then if it works like the internal version we have today, the code above has a data race, which we’ll all agree is bad.
Or we do an API version that when the queue you do the trysync on is not targetted at a global root queue always go through async, and that is weird, because the performance characteristics would completely depend on the target queue hierarchy, which when layering and frameworks start to be at play, is a bad characteristic for a good API.

Or we don’t give up right away when the hierarchy is deep, but then that means that dispatch_trysync would need to be able to unwind all the locks it took, and then you have ordering issues because enqueuing that block that couldn’t run synchronously may end up being after another one and break the FIFO ordering of queues. Respecting this which is a desired property of our API and getting an efficient implementation are somehow at odds.

The other argument against trysync that way, is that during testing trysync would almost always go through the non contended codepath, and lead developers to not realize that they should have taken copies of variables and the like (this is less of a problem on Darwin with obj-c and ARC), but trysync running on the same thread will hide that. except that once it starts being contended in production, it’ll bite you hard with memory corruption everywhere.

Technically what you’re after is that bringing up a new thread is very costly and that you’d rather use the one that’s asyncing the request because it will soon give up control. The wake up of a queue isn’t that expensive, in the sense that the overhead of dispatch_sync() in terms of memory barriers and locking is more or less comparable. What’s expensive is creating a thread to satisfy this enqueue.

In my opinion, to get the win you’re after, you’d rather want an async() version that if it wakes up the target queue hierarchy up to the root then you  want to have more resistance in bringing up a new thread to satisfy that request. Fortunately, the overcommit property of queues could be used by a thread pool to decide to apply that resistance. There are various parts of the thread pool handling (especially without kernel work queues support) that could get some love to get these exact benefits without changing the API.


[1] https://github.com/apple/swift-corelibs-libdispatch/blob/394d9a1c8be525cde8d9dd9fb8cef8308089b9c5/src/queue.c#L3089

-Pierre

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-corelibs-dev/attachments/20151207/a3d649f5/attachment.html>


More information about the swift-corelibs-dev mailing list