<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class="">
-Pierre
</div>
<br class=""><div><blockquote type="cite" class=""><div class="">On Dec 8, 2015, at 7:34 AM, Joakim Hassila via swift-corelibs-dev <<a href="mailto:swift-corelibs-dev@swift.org" class="">swift-corelibs-dev@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Pierre,
<div class=""><br class="">
</div>
<div class="">Thanks for the good explanation, will try to respond inline below:</div>
<div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 7 dec. 2015, at 23:10, Pierre Habouzit <<a href="mailto:phabouzit@apple.com" class="">phabouzit@apple.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">Hi Joakim, Kevin,</div>
<div class=""><br class="">
</div>
<div class="">[ Full disclosure, I made that reply in <a href="rdar://problem/16436943" class="">rdar://problem/16436943</a> and your use case was slightly different IIRC but you’re right it’s a close enough problem ]</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">Dispatch internally has a notion of something that does almost that, called _dispatch_barrier_trysync_f[1]. However, it is used internally to serialize state changes on sources and queues such as setting the target queue or event handlers.</div>
<div class=""><br class="">
</div>
<div class="">The problem is that this call bypasses the target queue hierarchy in its fastpath, which while it’s correct when changing the state of a given source or queue, is generally the wrong thing to do. Let’s consider this code assuming the dispatch_barrier_trysync()</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(193, 193, 193); background-color: rgb(0, 0, 0);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4; background-color: #121212" class=""><br class="">
</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(193, 193, 193); background-color: rgb(0, 0, 0);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4; background-color: #121212" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #2ee621" class="">dispatch_queue_t</span><span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">
outer = </span>dispatch_queue_create<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">(</span><span style="font-variant-ligatures: no-common-ligatures; color: #fb3b1d" class="">"outer"</span><span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">,
</span><span style="font-variant-ligatures: no-common-ligatures; color: #fb3b1d" class="">NULL</span><span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">);</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(193, 193, 193); background-color: rgb(0, 0, 0);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4; background-color: #121212" class="">
</span><span style="font-variant-ligatures: no-common-ligatures; color: #2ee621" class="">dispatch_queue_t</span><span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class=""> inner =
</span>dispatch_queue_create<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">(</span><span style="font-variant-ligatures: no-common-ligatures; color: #fb3b1d" class="">"inner"</span><span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">,
</span><span style="font-variant-ligatures: no-common-ligatures; color: #fb3b1d" class="">NULL</span><span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">);</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(193, 193, 193); background-color: rgb(0, 0, 0);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4; background-color: #121212" class="">
</span>dispatch_set_target_queue<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">(outer, inner);</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(245, 245, 245); background-color: rgb(0, 0, 0); min-height: 11px;" class="">
<br class="">
</div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(193, 193, 193); background-color: rgb(0, 0, 0);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4; background-color: #121212" class="">
</span>dispatch_async<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">(inner, ^{</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(215, 215, 255); background-color: rgb(18, 18, 18);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4" class="">
</span>// write global state protected by inner</div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(228, 228, 228); background-color: rgb(18, 18, 18);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9; background-color: #000000" class="">
});</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(193, 193, 193); background-color: rgb(0, 0, 0);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4; background-color: #121212" class="">
</span>dispatch_barrier_trysync<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9" class="">(outer, ^{</span></div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(215, 215, 255); background-color: rgb(18, 18, 18);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e4e4e4" class="">
</span>// write global state protected by inner</div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(228, 228, 228); background-color: rgb(18, 18, 18);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9; background-color: #000000" class="">
});</span></div>
</div>
<div style="margin: 0px; font-size: 10px; line-height: normal; font-family: Menlo; color: rgb(228, 228, 228); background-color: rgb(18, 18, 18);" class="">
<span style="font-variant-ligatures: no-common-ligatures; color: #e9e9e9; background-color: #000000" class=""><br class="">
</span></div>
<div class=""><br class="">
</div>
<div class="">Then if it works like the internal version we have today, the code above has a data race, which we’ll all agree is bad.</div>
<div class="">Or we do an API version that when the queue you do the trysync on is not targetted at a global root queue always go through async, and that is weird, because the performance characteristics would completely depend on the target queue hierarchy,
which when layering and frameworks start to be at play, is a bad characteristic for a good API.</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Yes, we could currently assume that we only targeted a root queue for our use case, so our implementation has this limitation (so it is not a valid general solution as you say). </div>
<div class=""><br class="">
</div>
<div class="">It would perhaps be a bit strange to have different performance characteristics depending on the target queue hierarchy as you say, but there are already some performance differences in actual behavior if using e.g. an overcommit queue vs a non, so perhaps
another option would be to have this as an optional queue attribute instead of an additional generic API (queue attribute ’steal calling thread for inline processing of requests if the queue was empty when dispatching’) …?</div></div></div></div></div></blockquote><div><br class=""></div><div>My point is, adding API to dispatch is not something we do lightly. I’m not keen on an interface that only works for base queues. Mac OS and iOS code where dispatchy code is pervasive, more than 2 queue deep queues hierarchy is very common typically.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">
<div class="">Or we don’t give up right away when the hierarchy is deep, but then that means that dispatch_trysync would need to be able to unwind all the locks it took, and then you have ordering issues because enqueuing that block that couldn’t run synchronously
may end up being after another one and break the FIFO ordering of queues. Respecting this which is a desired property of our API and getting an efficient implementation are somehow at odds.</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Yes, agree it is a desirable property of the API to retain the ordering.</div>
<br class="">
<blockquote type="cite" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class="">
<div class="">The other argument against trysync that way, is that during testing trysync would almost always go through the non contended codepath, and lead developers to not realize that they should have taken copies of variables and the like (this is less
of a problem on Darwin with obj-c and ARC), but trysync running on the same thread will hide that. except that once it starts being contended in production, it’ll bite you hard with memory corruption everywhere.</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Less of an issue for us as we depend on the _f interfaces throughout due to portability concerns, but fair point.</div>
<br class="">
<blockquote type="cite" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class="">
<div class="">Technically what you’re after is that bringing up a new thread is very costly and that you’d rather use the one that’s asyncing the request because it will soon give up control. The wake up of a queue isn’t that expensive, in the sense that the
overhead of dispatch_sync() in terms of memory barriers and locking is more or less comparable. What’s expensive is creating a thread to satisfy this enqueue.</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Yes, in fact, bringing up a new thread is so costly that we keep a pool around in the libpwq implementation. Unfortunately we would often see double-digit microsecond latency incurred by this, which is unacceptable for us, so we had to (for some configurations/special
deployments) have a dedicated spin thread that will grab the next queue to work on (that cut down the latency with a factor of 10 or so) and the next thread woken from the thread pool would take over a spinner…</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class="">
<div class=""><br class="">
</div>
<blockquote type="cite" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class="">In my opinion, to get the win you’re after, you’d rather want an async() version that if it wakes up the target queue hierarchy up to the root then you want to have more resistance in bringing up a new thread to satisfy that request. Fortunately,
the overcommit property of queues could be used by a thread pool to decide to apply that resistance. There are various parts of the thread pool handling (especially without kernel work queues support) that could get some love to get these exact benefits without
changing the API.</div>
</div>
</div>
</blockquote>
<br class="">
</div>
<div class="">That would indeed be a very interesting idea, the problem is that the thread using ‘dispatch_barrier_trysync’ is not returning to the pthread_workqueue pool to grab the next dispatch queue for processing, but is instead going back to block on a syscall
(e.g. read() from a socket) - and even the latency to wake up a thread (as is commonly done now) with mutex/condition signaling is way too slow for the use case we have (thus the very ugly workaround with a spin thread for some deployments).</div>
<div class=""><br class="">
</div>
<div class="">Essentially, for these kind of operations we really want to avoid all context switches as long as we can keep up with the rate of inbound data, and in general such dynamics would be a nice property to have - if the thread performing the async call was
known to always return to the global pwq thread pool, it would be nicely solved by applying resistance as you suggest, the problem is what to do when it gets blocked and you thus get stuck.</div>
<div class=""><br class="">
</div>
Perhaps we have to live with the limited implementation we have for practical purposes, but I have the feeling that the behavior we are after would be useful for other use cases, perhaps the queue attribute suggested above could be another way of expressing
it without introducing new dispatch API. </div></div></div></blockquote><br class=""></div><div>I completely agree with you, but I think that the way to address this is by making the thread pool smarter, not having the developper have to sprinkle his code with dispatch_barrier_trysync() where he feels like it. Using it properly require a deep understanding of the implementation of dispatch he’s using and changes on each platform / version combination. that’s not really the kind of interface we want to build.</div><div><br class=""></div><div>“overcommit” is exactly the hint you’re after as far as the queue is concerned. It means “if I’m woken up, bring up a new thread provided it doesn’t blow up the system, no matter what”. So make your queue non overcommit by targetting it manually to dispatch_get_global_queue(0, 0) (that one isn’t overcommit), and make the thread pool smarter. That’s the right way to go and the design-compatible way to do it.</div><div><br class=""></div><div>If your thread block in read() then I would argue that it should use a READ dispatch source instead, that way, the source would get enqueued *after* your async and you can ping pong. Doing blocking read()s is not dispatchy at all and will cause you all sorts of problems like that one, because re-async doesn’t work for you.</div><div><br class=""></div><div>-Pierre</div><br class=""></body></html>