[swift-evolution] [Concurrency] async/await + actors

Mon Sep 4 12:36:37 CDT 2017

On Sep 3, 2017, at 12:44 PM, Pierre Habouzit <phabouzit at apple.com> wrote:
>>>>> My currently not very well formed opinion on this subject is that GCD queues are just what you need with these possibilities:
>>>>> - this Actor queue can be targeted to other queues by the developer when he means for these actor to be executed in an existing execution context / locking domain,
>>>>> - we disallow Actors to be directly targeted to GCD global concurrent queues ever
>>>>> - for the other ones we create a new abstraction with stronger and better guarantees (typically limiting the number of possible threads servicing actors to a low number, not greater than NCPU).
>>>> 
>>>> Is there a specific important use case for being able to target an actor to an existing queue?  Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?
>>> 
>>> Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.
>> 
>> Ok.  I don’t understand the use-case well enough to know how we should model this.  For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?
> 
> I think I need to read more on actors, because the same way you're not an OS runtime expert, I'm not (or rather no longer, I started down that path a lifetime ago) a language expert at all, and I feel like I need to understand your world better to try to explain this part better to you.

No worries.  Actually, after thinking about it a bit, I don’t think that switching underlying queues at runtime is scary.

The important semantic invariant which must be maintained is that there is only one thread executing within an actor context at a time.  Switching around underlying queues (or even having multiple actors on the same queue) shouldn’t be a problem.

OTOH, you don’t want an actor “listening” to two unrelated queues, because there is nothing to synchronize between the queues, and you could have multiple actor methods invoked at the same time: you lose the protection of a single serial queue. 

The only concern I’d have with an actor switching queues at runtime is that you don’t want a race condition where an item on QueueA goes to the actor, then it switches to QueueB, then another item from QueueB runs while the actor is already doing something for QueueA.

>>> I think what you said made sense.
>> 
>> Ok, I captured this in yet-another speculative section:
>> https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency <https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency>
> Great. BTW I agree 100% with:
> 
> That said, this is definitely a power-user feature, and we should understand, build, and get experience using the basic system before considering adding something like this.
> 
> Private concurrent queues are not a success in dispatch and cause several issues, these queues are second class citizens in GCD in terms of feature they support, and building something with concurrency *within* is hard. I would keep it as "that's where we'll go some day" but not try to attempt it until we've build the simpler (or rather less hard) purely serial case first.

Right, I agree this is not important for the short term.  To clarify though, I meant to indicate that these actors would be implemented completely independently of dispatch, not that they’d build on private concurrent queues.

>>> Another problem I haven't touched either is kernel-issued events (inbound IPC from other processes, networking events, etc...). Dispatch for the longest time used an indirection through a manager thread for all such events, and that had two major issues:
>>> 
>>> - the thread hops it caused, causing networking workloads to utilize up to 15-20% more CPU time than an equivalent manually made pthread parked in kevent(), because networking pace even when busy idles back all the time as far as the CPU is concerned, so dispatch queues never stay hot, and the context switch is not only a scheduled context switch but also has the cost of a thread bring up
>>> 
>>> - if you deliver all possible events this way you also deliver events that cannot possibly make progress because the execution context that will handle them is already "locked" (as in busy running something else.
>>> 
>>> It took us several years to get to the point we presented at WWDC this year where we deliver events directly to the right dispatch queue. If you only have very anonymous execution contexts then all this machinery is wasted and unused. However, this machinery has been evaluated and saves full percents of CPU load system-wide. I'd hate for us to go back 5 years here.
>> 
>> I don’t have anything intelligent to say here, but it sounds like you understand the issues well :-)  I agree that giving up 5 years of progress is not appealing.
> 
> TBH our team has to explain into more depth how eventing works on Darwin, the same way we (or maybe it's just I, I don't want to disparage my colleagues here :P) need to understand Actors and what they mean better, I think the swift core team (and whoever works on concurrency) needs to be able to understand what I explained above.

Makes sense.  In Swift 6 or 7 or whenever actors are a plausible release goal, a lot of smart people will need to come together to scrutinize all the details.  Iteration and improvement over the course of a release cycle is also sensible.

The point of the document is to provide a long term vision of where things could go, primarily to unblock progress on async/await in the short term.  For a very long time now, the objection to doing anything with async/await has been that people don’t feel that they know how and whether async/await would fit into the long term model for Swift concurrency.  Indeed, the discussions around getting “context” into the async/await design are a concrete example of how considering the long term direction is important to help shape the immediate steps.

That said, we’re still a ways off from actually implementing an actor model, so we have some time to sort it out.

> (2) is what scares me: the kernel has stuff to deliver, and kernels don't like to hold on data on behalf of userspace forever, because this burns wired memory. This means that this doesn't quite play nice with a global anonymous pool that can be starved at any time. Especially if we're talking XPC Connections in a daemon, where super high priority clients such as the frontmost app (or even worse, SpringBoard) can ask you questions that you'd better answer as fast as you can.
> 
> The solution we recommend to solve this to developers (1st and 3rd parties), is for all your XPC connections for your clients are rooted at the same bottom queue that represents your "communication" subsystem, so you can imagine it be the "Incoming request multiplexer" Actor or something, this one is in the 2nd category and is known to the kernel so that the kernel can instantiate the Actor itself without asking permission to userspace and directly make the execution context, and rely on the scheduler to get the priorities right.

Thanks for the explanation, I understand a lot better what you’re talking about now.  To me, this sounds like the concern of a user space framework author (e.g. the folks writing Kitura), not the users of the frameworks.  As such, I have no problem with making it “more difficult” to set up the right #2 abstractions.

> we need to embrace it and explain to people that everywhere in a traditional POSIX world they would have used a real pthread_create()d thread to perform the work of a given subsystem, they create one such category #2 bottom queue that represents this thread (and you make this subsystem an Actor), 

Makes sense.  This sounds like a great opportunity for actors to push the world even farther towards sensible designs, rather than cargo culting the old threads+channels model.

>>  Also, I think we should strongly encourage pure async “fire and forget” actor methods anyway - IOW, we should encourage push, not pull
> 
> 
> I almost agree. We should strongly encourage the `pure async "account for, fire and forget" actor methods`. The `account for` is really backpressure, where you actually don't fire if the remote queue is full and instead rely on some kind of reactive pattern to pull from you. (but I know you wrote that on your proposal and you're aware of it).

Yep, I was trying to get across the developer mindset of “push, not pull” when it comes to decomposing problems and setting up the actor graph.

I think that - done right - the remote queue API can be done in a way where it looks like you’re writing naturally “push” code, but that the API takes care of making the right thing happen.

>> - since they provide much stronger guarantees in general.
> 
> It depends which guarantees you're talking about. I don't think this statement is true. Async work has good and strong properties when you write code in the "normal" priority ranges, what we refer as to "in the QoS world" on Darwin (from background up to UI work).

"stronger guarantees” is probably not the right way to express this.  I’m talking about things like “if you don’t wait, it is much harder to create deadlocks”.  Many problems are event-driven or streaming, which are naturally push.  I can’t explain why I think this, but it seems to me that push designs encourage more functional approaches, but pull designs tend to be more imperative/stateful.  The later feels familiar, but encourages the classical bugs we’re all used to :-)

> However, there are tons of snowflakes on any platform that can't be in that world:
> - media rendering (video/audio)
> - HID (touch, gesture recognition, keyboard, mouses, trackpads, ...)
> - some use cases of networking (bluetooth is a very good example, you hate when your audio drops with your bluetooth headset don't you?)
> - ...
> 
> And these use cases are many, and run in otherwise regular processes all the time.

I think there is some misunderstanding here.  I’m not saying that sync is bad, I’m only talking about the default abstraction and design patterns that people should reach for first.

The general design I’m shooting for here is to provide a default abstractions that work 80%+ of the time, allowing developers to have a natural first step to reach for when they build their code.  However, any single abstraction will have limitations and problems in some use cases, and some of those snowflakes are SO important (media is a great example) that it isn’t acceptable to take any hit. This is why I think it is just as important to have an escape hatch.  The biggest escape hatches we’ve talked about are multithreaded actors, but folks could also simply “not use actors” if they aren’t solving problems for them.

Swift aims to be pragmatic, not dogmatic.  If you’re implementing a media decoder, write the thing in assembly if you want.  My feelings won’t be hurt :-)

> However it is a fact of life that these subsystems, have to interact with generic subsystems sometimes, and that mean they need to be able to synchronously wait on an actor, so that this actor's priority is elevated. And you can't waive this off, there are tons of legitimate reasons for very-high priorities subsystems to have to interact and wait on regular priority work.

I understand completely, which is why synchronous waiting is part of the model.  Despite what I say above, I really don’t want people to avoid actors or write their code in assembly.  :-)

My point about pull model is that it seems like the right *default* for people to reach for, not that it should be the only or exclusive mechanic proposed.  This is one reason that I think it is important to introduce async/await before actors - so we have the right mechanic to build this waiting on top of.

> I 100% agree with you that if *everything* was asynchronous and written this way, our lives would be great. I don't however think it's possible on real life operating system to write all your code this way. And this is exactly where things start to be *very* messy.

+1, again, this pragmatism is exactly why the proposal describes actor methods returning values, even though it is not part of the standard actor calculus that academia discusses:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await

If you have some suggestion for how I can clarify the writing to address the apparent confusion here, please let me know and I’ll be happy to fix it.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170904/b9b7478d/attachment.html>