[swift-evolution] [Concurrency] async/await + actors

Fri Aug 18 10:13:08 CDT 2017

Hi Chris & swift-evo,

(Given the already lengthy thread I tried to separate my points and keep them reasonably short to allow people to skip points they don't care about. I'm very happy to expand on the points.)

Thanks very much for writing up your thoughts/proposal, I've been waiting to see the official kick-off for the concurrency discussions :).

I) Let's start with the async/await proposal. Personally I think this is the right direction for Swift given the reality that we need to interface with incredibly large existing code-bases and APIs. Further thoughts:

- ❓ GCD: dispatching onto calling queue, how?
GCD doesn't actually allow you to dispatch back to the original queue, so I find it unclear how you'd achieve that. IMHO the main reason is that conceptually at a given time you can be on more than one queue (nested q.sync{}/target queues). So which is 'the' current queue?

- ⊥ first class coroutine model => async & throws should be orthogonal
given that the proposal pitches to be the beginning of a first class coroutine model (which I think is great), I think `async` and `throws` do need to be two orthogonal concepts. I wouldn't want automatically throwing generators in the future ;). Also I think we shouldn't throw spanner in the works of people who do like to use Result<E, T> types to hold the errors or values. I'd be fine with async(nothrow) or something though.
- what do we do with functions that invoke their closure multiple times? Like DispatchIO.read/write.

II) the actor model part

- 👯 Erlang runtime and the actor model go hand in hand 
I really like the Erlang actor model but I don't think it can be separated from Erlang's runtime. The runtime offers green threads (which allow an actor to block without blocking an OS thread) and prevents you from sharing memory (which makes it possible to kill any actor at any point and still have a reliable system). I don't see these two things happening in Swift. To a lesser extend these issues are also present in Scala/Akka, the mitigate some of the problems by having Akka Streams. Akka Streams are important to establish back-pressure if you have faster producers than consumers. Note that we often can't control the producer, they might be on the other side of a network connection. So it's often very important to not read the available bytes to communicate to the kernel that we can't consumes bytes that fast. If we're networking with TCP the kernel can then use the TCP flow-control to signal to the other side that they better slow down (or else packets will be dropped and then need to be resent later).

- 💥 regarding fatal failure in actors
in the server world we need to be able to accept hundreds of thousands (millions) of connections at the same time. There are quite a few cases where these connections are long-lived and paused for most of the the time. So I don't really see the value in introducing a 'reliable' actor model where the system stops accepting new connections if one actor fatalError'd and then 'just' finishes up serving the existing connections. So I believe there are only two possible routes: 1) treat it like C/C++ and make sure your code doesn't fatalError or the whole process blows up (what we have right now) 2) treat it like Erlang and let things die. IMHO Erlang wouldn't be successful if actors couldn't just die or couldn't be linked. Linking propagates failures to all linked processes. A common thing to do is to 1) spawn a new actor 2) link yourself to the newly spawned actor 3) send a message to that actor and at some point eventually await a reply message sent by the actor spawned earlier. As you mentioned in the writeup it is a problem if the actor doesn't actually reply which is why in Erlang you'd link them. The effect is that if the actor we spawned dies, any linked actor will die too which will the propagate the error to an appropriate place. That allows the programmer to control where an error should propagate too. I realise I'm doing a poor job in explaining what is best explained by documentation around Erlang: supervision [1] and the relationship between what Erlang calls a process (read 'actor') and errors [2].

- ♨️ OS threads and actors
as you mention, the actor model only really works if you can spawn lots of them, so it's very important to be able to run hundreds of thousands of them on a number of OS threads pretty much equal to your number of cores. That's only straightforward if there are no (OS thread) blocking operations or at least it's really obvious what blocks and what doesn't. And that's not the case in Swift today and with GCD you really feel that pain. GCD does spawn threads for you and has a rather arbitrary limit of 64 OS threads (by default on macOS). That is too many for a very scalable server application but too few to just tolerate blocking APIs.

[1]: http://erlang.org/documentation/doc-4.9.1/doc/design_principles/sup_princ.html
[2]: http://learnyousomeerlang.com/errors-and-processes

-- Johannes

> On 17 Aug 2017, at 11:25 pm, Chris Lattner via swift-evolution <swift-evolution at swift.org> wrote:
> 
>> 
>> On Aug 17, 2017, at 3:24 PM, Chris Lattner <clattner at nondot.org> wrote:
>> 
>> Hi all,
>> 
>> As Ted mentioned in his email, it is great to finally kick off discussions for what concurrency should look like in Swift.  This will surely be an epic multi-year journey, but it is more important to find the right design than to get there fast.
>> 
>> I’ve been advocating for a specific model involving async/await and actors for many years now.  Handwaving only goes so far, so some folks asked me to write them down to make the discussion more helpful and concrete.  While I hope these ideas help push the discussion on concurrency forward, this isn’t in any way meant to cut off other directions: in fact I hope it helps give proponents of other designs a model to follow: a discussion giving extensive rationale, combined with the long term story arc to show that the features fit together.
>> 
>> Anyway, here is the document, I hope it is useful, and I’d love to hear comments and suggestions for improvement:
>> https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782
> 
> Oh, also, one relatively short term piece of this model is a proposal for adding an async/await model to Swift (in the form of general coroutine support).  Joe Groff and I wrote up a proposal for this, here:
> https://gist.github.com/lattner/429b9070918248274f25b714dcfc7619
> 
> and I have a PR with the first half of the implementation here:
> https://github.com/apple/swift/pull/11501
> 
> The piece that is missing is code generation support.
> 
> -Chris
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution