[swift-evolution] [Concurrency] async/await + actors
pierre at habouzit.net
Thu Aug 31 21:24:24 CDT 2017
I fail at Finding the initial mail and am quite late to the party of commenters, but there are parts I don't undertsand or have questions about.
The one problem I anticipate with GCD is that it doesn't scale well enough: server developers in particular will want to instantiate hundreds of thousands of actors in their application, at least one for every incoming network connection. The programming model is substantially harmed when you have to be afraid of creating too many actors: you have to start aggregating logically distinct stuff together to reduce # queues, which leads to complexity and loses some of the advantages of data isolation.
What do you mean by this? queues are serial/exclusive execution contexts, and if you're not modeling actors as being serial queues, then these two concepts are just disjoint. The former (queues) represent where the code runs physically, gives you some level of scheduling, possibly prioritization, and the context is the entity that is known to the kernel so that when you need synchronization between two execution context (because despite your best intentions there is global mutable state on the system that Swift uses all the time whether it's through frameworks, malloc or simply any syscall), it can resolve priority inversions and do smart things to schedule these contexts.
Actors are the way you present the various tasks/operations/activities that you schedule. These contexts are a way for the developer to explain which things are related in a consistent system, and give them access to state which is local to this context (whether it's TSD for threads, or queue specific data, or any similar context), which is data that is not shared horizontally (across several concurrent execution contexts) but vertically (across all the hierarchy of actors/work items/... that you schedule on these execution contexts, hence require no locks and are "good" for the system).
GCD is trying to be a very efficient way to communicate and message between execution contexts that you know and represent your software architecture in your product/app/server/.... Using queues for anything else will indeed scale poorly.
IMO, Swift as a runtime should define what an execution context is, and be relatively oblivious of which context it is exactly as long it presents a few common capabilities:
- possibility to schedule work (async)
- have a name
- be an exclusion context
- is an entity the kernel can reason about (if you want to be serious about any integration on a real operating system with priority inheritance and complex issues like this, which it is the OS responsibility to handle and not the language)
In that sense, whether your execution context is:
- a dispatch serial queue
- a CFRunloop
- a libev/libevent/... event loop
- your own hand rolled event loop
Then this is fine, this is something where Swift could enqueue its own "schedule Swift closures on this context" at the very least, and for the ones that have native integration do smarter things (I'd expect runloops or libdispatch to be such better integrated citizens given that they're part of the same umbrella ;p). If you layer the runtime this way, then I don't see how GCD can be a hindrance, it's just one of the several execution contexts that can host Actors.
While mentioning this, I've seen many people complain that dispatch_get_current_queue() is deprecated. It is so for tons of valid reasons, it's too sharp an API to use for developers, but as part of integrating with the swift runtime, having a "please give me a reference on the current execution context" is trivially implementable when we know what the Swift runtime will do with it and has a reasonable use.
Design sketch for interprocess and distributed compute
One of these principles is the concept of progressive disclosure of complexity <https://en.wikipedia.org/wiki/Progressive_disclosure>: a Swift developer shouldn't have to worry about IPC or distributed compute if they don't care about it.
While I agree with the sentiment, I don't think that anything useful can be done without "distributed" computation. I like the loadResourceFromTheWeb example, as we have something like this on our platform, which is the NSURLSession APIs, or the CloudKit API Surface, that are about fetching some resource from a server (URL or CloudKit database records). However, they don't have a single result, they have:
- progress notification callbacks
- broken down notifications for the results (e.g. headers first and body second, or per-record for CloudKit operations)
- various levels of error reporting.
I expect most developers will have to use such a construct, and for these, having a single async pivot in your code that essentially fully serializes your state machine on getting a full result from the previous step to be lacking. Similarly, for the 3 categories I listed above, it's very likely that you want these notifications to be seen as various consequences of the initiator of the download, and they are typically sent to very specific execution contexts:
- progress usually goes to the main thread / UI thread because it's about reporting stuff to the user
- notification go to some validation logic that will assemble frames and reconstruct the whole payload which is likely some utility context on the side, until the full download is done and then you want to resume handling the result of the operation on some subsystem's context that was interested in this download.
Delivering all these notifications on the context of the initiator would be quite inefficient as clearly there are in my example above two very different contexts, and having to hop through one to reach the other would make this really terrible for the operating system. I also don't understand how such operations would be modeled in the async/await world to be completely honest.
As a former framework developer, I liked to be able to reason about contexts and silos of execution, and now as a runtime (as in operating system runtime, not language ;p) developer I like people who think that way best because this is how an operating system works, and any organization that is too remote from these silos has huge impedance mismatches and execute very poorly without a lot of manual fiddling with your language runtime knobs: the JVM is really infamous for this, but from where I stand I've seen go follow a similar trend where there are knobs that will affect the performance of the code that you run dramatically, and where no single setup is good for everyone. I actually strongly believe that it's impossible for the runtime to figure out these things, and it's best to create an environment that helps you think that way and organize your software architecture this way.
In other terms, in all this proposal, if Actors are people the developer can play with, we need to allow the developer to create housing too.
> On Aug 19, 2017, at 4:06 PM, Chris Lattner via swift-evolution <swift-evolution at swift.org> wrote:
> On Aug 19, 2017, at 2:02 AM, Susan Cheng <susan.doggie at gmail.com> wrote:
>> Hi chris,
>> is a actor guarantee always process the messages in one by one?
>> so, can it assume that never being multiple threads try to modify the state at the same time?
> Yep, that’s the idea.
>> P.S. i have implemented similar idea before:
> Cool. That’s one of the other interesting things about the actor model. We can prototype and build it as a completely library feature to get experience with the runtime model, then move to language support (providing the additional safety) when things seem to work well in practice.
> swift-evolution mailing list
> swift-evolution at swift.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the swift-evolution