[swift-evolution] [Concurrency] async/await + actors

Mon Sep 4 13:40:39 CDT 2017

> On Sep 4, 2017, at 10:36 AM, Chris Lattner via swift-evolution <swift-evolution at swift.org> wrote:
> 
> On Sep 3, 2017, at 12:44 PM, Pierre Habouzit <phabouzit at apple.com <mailto:phabouzit at apple.com>> wrote:
>>>>>> My currently not very well formed opinion on this subject is that GCD queues are just what you need with these possibilities:
>>>>>> - this Actor queue can be targeted to other queues by the developer when he means for these actor to be executed in an existing execution context / locking domain,
>>>>>> - we disallow Actors to be directly targeted to GCD global concurrent queues ever
>>>>>> - for the other ones we create a new abstraction with stronger and better guarantees (typically limiting the number of possible threads servicing actors to a low number, not greater than NCPU).
>>>>> 
>>>>> Is there a specific important use case for being able to target an actor to an existing queue?  Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?
>>>> 
>>>> Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.
>>> 
>>> Ok.  I don’t understand the use-case well enough to know how we should model this.  For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?
>> 
>> I think I need to read more on actors, because the same way you're not an OS runtime expert, I'm not (or rather no longer, I started down that path a lifetime ago) a language expert at all, and I feel like I need to understand your world better to try to explain this part better to you.
> 
> No worries.  Actually, after thinking about it a bit, I don’t think that switching underlying queues at runtime is scary.
> 
> The important semantic invariant which must be maintained is that there is only one thread executing within an actor context at a time.  Switching around underlying queues (or even having multiple actors on the same queue) shouldn’t be a problem.
> 
> OTOH, you don’t want an actor “listening” to two unrelated queues, because there is nothing to synchronize between the queues, and you could have multiple actor methods invoked at the same time: you lose the protection of a single serial queue. 
> 
> The only concern I’d have with an actor switching queues at runtime is that you don’t want a race condition where an item on QueueA goes to the actor, then it switches to QueueB, then another item from QueueB runs while the actor is already doing something for QueueA.
> 
> 
>>>> I think what you said made sense.
>>> 
>>> Ok, I captured this in yet-another speculative section:
>>> https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency <https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency>
>> Great. BTW I agree 100% with:
>> 
>> That said, this is definitely a power-user feature, and we should understand, build, and get experience using the basic system before considering adding something like this.
>> 
>> Private concurrent queues are not a success in dispatch and cause several issues, these queues are second class citizens in GCD in terms of feature they support, and building something with concurrency *within* is hard. I would keep it as "that's where we'll go some day" but not try to attempt it until we've build the simpler (or rather less hard) purely serial case first.
> 
> Right, I agree this is not important for the short term.  To clarify though, I meant to indicate that these actors would be implemented completely independently of dispatch, not that they’d build on private concurrent queues.
> 
> 
>>>> Another problem I haven't touched either is kernel-issued events (inbound IPC from other processes, networking events, etc...). Dispatch for the longest time used an indirection through a manager thread for all such events, and that had two major issues:
>>>> 
>>>> - the thread hops it caused, causing networking workloads to utilize up to 15-20% more CPU time than an equivalent manually made pthread parked in kevent(), because networking pace even when busy idles back all the time as far as the CPU is concerned, so dispatch queues never stay hot, and the context switch is not only a scheduled context switch but also has the cost of a thread bring up
>>>> 
>>>> - if you deliver all possible events this way you also deliver events that cannot possibly make progress because the execution context that will handle them is already "locked" (as in busy running something else.
>>>> 
>>>> It took us several years to get to the point we presented at WWDC this year where we deliver events directly to the right dispatch queue. If you only have very anonymous execution contexts then all this machinery is wasted and unused. However, this machinery has been evaluated and saves full percents of CPU load system-wide. I'd hate for us to go back 5 years here.
>>> 
>>> I don’t have anything intelligent to say here, but it sounds like you understand the issues well :-)  I agree that giving up 5 years of progress is not appealing.
>> 
>> TBH our team has to explain into more depth how eventing works on Darwin, the same way we (or maybe it's just I, I don't want to disparage my colleagues here :P) need to understand Actors and what they mean better, I think the swift core team (and whoever works on concurrency) needs to be able to understand what I explained above.
> 
> Makes sense.  In Swift 6 or 7 or whenever actors are a plausible release goal, a lot of smart people will need to come together to scrutinize all the details.  Iteration and improvement over the course of a release cycle is also sensible.
> 
> The point of the document is to provide a long term vision of where things could go, primarily to unblock progress on async/await in the short term.  For a very long time now, the objection to doing anything with async/await has been that people don’t feel that they know how and whether async/await would fit into the long term model for Swift concurrency.  Indeed, the discussions around getting “context” into the async/await design are a concrete example of how considering the long term direction is important to help shape the immediate steps.
> 
> That said, we’re still a ways off from actually implementing an actor model, so we have some time to sort it out.
> 
> 
>> (2) is what scares me: the kernel has stuff to deliver, and kernels don't like to hold on data on behalf of userspace forever, because this burns wired memory. This means that this doesn't quite play nice with a global anonymous pool that can be starved at any time. Especially if we're talking XPC Connections in a daemon, where super high priority clients such as the frontmost app (or even worse, SpringBoard) can ask you questions that you'd better answer as fast as you can.
>> 
>> The solution we recommend to solve this to developers (1st and 3rd parties), is for all your XPC connections for your clients are rooted at the same bottom queue that represents your "communication" subsystem, so you can imagine it be the "Incoming request multiplexer" Actor or something, this one is in the 2nd category and is known to the kernel so that the kernel can instantiate the Actor itself without asking permission to userspace and directly make the execution context, and rely on the scheduler to get the priorities right.
> 
> Thanks for the explanation, I understand a lot better what you’re talking about now.  To me, this sounds like the concern of a user space framework author (e.g. the folks writing Kitura), not the users of the frameworks.  As such, I have no problem with making it “more difficult” to set up the right #2 abstractions.

I strongly disagree with this for several reasons.

Some Frameworks have a strong need for a serial context, some don't

It completely depends on the framework. If your framework is, say, a networking subsystem which is very asynchronous by nature for a long time, then yes, having the framework setup a #2 kind of guy inside it and have callbacks from/to this isolated context is just fine (and incidentally what your networking stack does).

However for some frameworks it makes very little sense to do this, they're better served using the "location" provided by their client and have some internal synchronization (locks) for the shared state they have. Too much framework code today creates their own #2 queue (if not queue*s*) all the time out of fear to be "blocked" by the client, but this leads to terrible performance.

[ disclaimer I don't know that Security.framework works this way or not, this is an hypothetical ]

For example, if you're using Security.framework stuff (that requires some state such as say your current security ephemeral keys and what not), using a private context instead of using the callers is really terribly bad because it causes tons of context-switches: such a framework should really *not* use a context itself, but a traditional lock to protect global state. The reason here is that the global state is really just a few keys and mutable contexts, but the big part of the work is the CPU time to (de)cipher, and you really want to parallelize as much as you can here, the shared state is not reason enough to hop.

It is tempting to say that we could still use a private queue to hop through to get the shared state and back to the caller, that'd be great if the caller would tail-call into the async to the Security framework and allow for the runtime to do a lightweight switch to the other queue, and then back. The problem is that real life code never does that: it will rarely tail call into the async (though with Swift async/await it would) but more importantly there's other stuff on the caller's context, so the OS will want to continue executing that, and then you will inevitably ask for a thread to drain that Security.framework async.

In our experience, the runtime can never optimize this Security async pattern by never using an extra thread for the Security work.

Top level contexts are a fundamental part of App (process) design

It is actually way better for the app developer to decide what the subsystems of the app are, and create well known #2 context for these. In our WWDC Talk we took the hypothetical example of News.app, that fetches stuff from RSS feeds, has a database to know what to fetch and what you read, the UI thread, and some networking parts to interact with the internet.

Such an app should upfront create 3 "#2" guys:
- the main thread for UI interactions (this one is made for you obviously)
- the networking handling context
- the database handling context

The flow of most of the app is: UI triggers action, which asks the database subsystem (brain) what to do, which possibly issues networking requests.
When a networking request is finished and that the assets have been reassembled on the network handling queue, it passes them back to the database/brain to decide how to redraw the UI, and issues the command to update the UI back to the UI.

At the OS layer we believe strongly that these 3 places should be made upfront and have strong identities. And it's not an advanced need, it should be made easy. The Advanced need is to have lots of these, and have subsystems that share state that use several of these contexts.

For everything else, I agree this hypothetical News.app can use an anonymous pools or reuse any of the top-level context it created, until it creates a scalability problem, in which case by [stress] testing the app, you can figure out which new subsystem needs to emerge. For example, maybe in a later version News.app wants beautiful articles and needs to precompute a bunch of things at the time the article is fetched, and that starts to take enough CPU that doing this on the networking context doesn't scale anymore. Then you just create a new top-level "Article Massaging" context, and migrate some of the workload there.

Why this manual partitionning?

It is our experience that the runtime cannot figure these partitions out by itself. and it's not only us, like I said earlier, Go can't either.

The runtime can't possibly know about locking domains, what your code may or may not hit (I mean it's equivalent to the termination problem so of course we can't guess it), or just data affinity which on asymmetric platforms can have a significant impact on your speed (NUMA machines, some big.LITTLE stuff, ...).

The default anonymous pool is fine for best effort work, no doubt we need to make it good, but it will never beat carefully partitioned subsystems.

>> we need to embrace it and explain to people that everywhere in a traditional POSIX world they would have used a real pthread_create()d thread to perform the work of a given subsystem, they create one such category #2 bottom queue that represents this thread (and you make this subsystem an Actor), 
> 
> Makes sense.  This sounds like a great opportunity for actors to push the world even farther towards sensible designs, rather than cargo culting the old threads+channels model.

It is, and this is exactly why I focus on your proposal a lot, I see a ton of value in it that go way beyond the expressiveness of the language.

>>>  Also, I think we should strongly encourage pure async “fire and forget” actor methods anyway - IOW, we should encourage push, not pull
>> 
>> 
>> I almost agree. We should strongly encourage the `pure async "account for, fire and forget" actor methods`. The `account for` is really backpressure, where you actually don't fire if the remote queue is full and instead rely on some kind of reactive pattern to pull from you. (but I know you wrote that on your proposal and you're aware of it).
> 
> Yep, I was trying to get across the developer mindset of “push, not pull” when it comes to decomposing problems and setting up the actor graph.
> 
> I think that - done right - the remote queue API can be done in a way where it looks like you’re writing naturally “push” code, but that the API takes care of making the right thing happen.
> 
>>> - since they provide much stronger guarantees in general.
>> 
>> It depends which guarantees you're talking about. I don't think this statement is true. Async work has good and strong properties when you write code in the "normal" priority ranges, what we refer as to "in the QoS world" on Darwin (from background up to UI work).
> 
> "stronger guarantees” is probably not the right way to express this.  I’m talking about things like “if you don’t wait, it is much harder to create deadlocks”.  Many problems are event-driven or streaming, which are naturally push.  I can’t explain why I think this, but it seems to me that push designs encourage more functional approaches, but pull designs tend to be more imperative/stateful.  The later feels familiar, but encourages the classical bugs we’re all used to :-)
> 
>> However, there are tons of snowflakes on any platform that can't be in that world:
>> - media rendering (video/audio)
>> - HID (touch, gesture recognition, keyboard, mouses, trackpads, ...)
>> - some use cases of networking (bluetooth is a very good example, you hate when your audio drops with your bluetooth headset don't you?)
>> - ...
>> 
>> And these use cases are many, and run in otherwise regular processes all the time.
> 
> I think there is some misunderstanding here.  I’m not saying that sync is bad, I’m only talking about the default abstraction and design patterns that people should reach for first.
> 
> The general design I’m shooting for here is to provide a default abstractions that work 80%+ of the time, allowing developers to have a natural first step to reach for when they build their code.  However, any single abstraction will have limitations and problems in some use cases, and some of those snowflakes are SO important (media is a great example) that it isn’t acceptable to take any hit. This is why I think it is just as important to have an escape hatch.  The biggest escape hatches we’ve talked about are multithreaded actors, but folks could also simply “not use actors” if they aren’t solving problems for them.
> 
> Swift aims to be pragmatic, not dogmatic.  If you’re implementing a media decoder, write the thing in assembly if you want.  My feelings won’t be hurt :-)

My concern was not about how you write their code, for all I care, they could use any language. It's about how they interact with the Swift world that I'm worried about.

Assuming these subsystem exist already and are implemented, it is our experience that it is completely impractical to ask from these subsystems to not ever interact with the rest of the world except through very gated interfaces. Eventually they need to use some kind of common/shared infrastructure, whether it's logging, some security/DRM decoding thing that needs to delegate to the SEP or some daemon, etc... and some of these generic OS layers would likely with time use Swift Actors.

Since await is asynchronous wait (IOW as my C-addicted brain translates it, equivalent to dispatch_group_notify(group, queue, ^{ tell me when what I'm 'waiting' on is done please })), that doesn't fly.
Those subsystem need to block synchronously with wait (no a) on a given Actor.

> 
>> However it is a fact of life that these subsystems, have to interact with generic subsystems sometimes, and that mean they need to be able to synchronously wait on an actor, so that this actor's priority is elevated. And you can't waive this off, there are tons of legitimate reasons for very-high priorities subsystems to have to interact and wait on regular priority work.
> 
> I understand completely, which is why synchronous waiting is part of the model.  Despite what I say above, I really don’t want people to avoid actors or write their code in assembly.  :-)
> 
> My point about pull model is that it seems like the right *default* for people to reach for, not that it should be the only or exclusive mechanic proposed.  This is one reason that I think it is important to introduce async/await before actors - so we have the right mechanic to build this waiting on top of.
> 
>> I 100% agree with you that if *everything* was asynchronous and written this way, our lives would be great. I don't however think it's possible on real life operating system to write all your code this way. And this is exactly where things start to be *very* messy.
> 
> +1, again, this pragmatism is exactly why the proposal describes actor methods returning values, even though it is not part of the standard actor calculus that academia discusses:
> https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await <https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await>
> 
> If you have some suggestion for how I can clarify the writing to address the apparent confusion here, please let me know and I’ll be happy to fix it.

Unless I misunderstood what await is dramatically, then I don't see where your write up addresses synchronous waiting anywhere yet.
Or is it that await turns into a synchronous wait if the function you're awaiting from is not an actor function? that would seem confusing to me.

-Pierre

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170904/155d7596/attachment.html>