[swift-evolution] [Concurrency] async/await + actors

Fri Aug 18 15:01:33 CDT 2017

> On Aug 18, 2017, at 8:13 AM, Johannes Weiß <johannesweiss at apple.com> wrote:
> 
> Hi Chris & swift-evo,
> 
> (Given the already lengthy thread I tried to separate my points and keep them reasonably short to allow people to skip points they don't care about. I'm very happy to expand on the points.)
> 
> Thanks very much for writing up your thoughts/proposal, I've been waiting to see the official kick-off for the concurrency discussions :).
> 
> I) Let's start with the async/await proposal. Personally I think this is the right direction for Swift given the reality that we need to interface with incredibly large existing code-bases and APIs. Further thoughts:

I think these points have been covered upthread, I’ll focus on the remaining topics:

> II) the actor model part
> 
> - 👯 Erlang runtime and the actor model go hand in hand 
> I really like the Erlang actor model but I don't think it can be separated from Erlang's runtime. The runtime offers green threads (which allow an actor to block without blocking an OS thread) and prevents you from sharing memory (which makes it possible to kill any actor at any point and still have a reliable system). I don't see these two things happening in Swift. To a lesser extend these issues are also present in Scala/Akka, the mitigate some of the problems by having Akka Streams. Akka Streams are important to establish back-pressure if you have faster producers than consumers. Note that we often can't control the producer, they might be on the other side of a network connection. So it's often very important to not read the available bytes to communicate to the kernel that we can't consumes bytes that fast. If we're networking with TCP the kernel can then use the TCP flow-control to signal to the other side that they better slow down (or else packets will be dropped and then need to be resent later).

Makes sense.   The design I outline does talk about how to get to effective elimination of shared mutable state, so the reliability piece should work.  As per greenthreads, I’m intentionally not getting into the runtime design, because there are lots of options and I’m not the best person to drive that discussion.

> - 💥 regarding fatal failure in actors
> in the server world we need to be able to accept hundreds of thousands (millions) of connections at the same time. There are quite a few cases where these connections are long-lived and paused for most of the the time. So I don't really see the value in introducing a 'reliable' actor model where the system stops accepting new connections if one actor fatalError'd and then 'just' finishes up serving the existing connections.

This was one example of how to handle errors, not meant to be the only possibility.

> So I believe there are only two possible routes: 1) treat it like C/C++ and make sure your code doesn't fatalError or the whole process blows up (what we have right now) 2) treat it like Erlang and let things die. IMHO Erlang wouldn't be successful if actors couldn't just die or couldn't be linked. Linking propagates failures to all linked processes. A common thing to do is to 1) spawn a new actor 2) link yourself to the newly spawned actor 3) send a message to that actor and at some point eventually await a reply message sent by the actor spawned earlier. As you mentioned in the writeup it is a problem if the actor doesn't actually reply which is why in Erlang you'd link them. The effect is that if the actor we spawned dies, any linked actor will die too which will the propagate the error to an appropriate place. That allows the programmer to control where an error should propagate too. I realise I'm doing a poor job in explaining what is best explained by documentation around Erlang: supervision [1] and the relationship between what Erlang calls a process (read 'actor') and errors [2].

Agreed.  I’m a fan of the “just let them die” approach, along with providing notification out when it happens (so something out of band can handle it).  That said, there is the problem of cleanup of the actor.  If failure is frequent, then you will want to be able to restart the process at some point in time.

> - ♨️ OS threads and actors
> as you mention, the actor model only really works if you can spawn lots of them, so it's very important to be able to run hundreds of thousands of them on a number of OS threads pretty much equal to your number of cores. That's only straightforward if there are no (OS thread) blocking operations or at least it's really obvious what blocks and what doesn't. And that's not the case in Swift today and with GCD you really feel that pain. GCD does spawn threads for you and has a rather arbitrary limit of 64 OS threads (by default on macOS). That is too many for a very scalable server application but too few to just tolerate blocking APIs.

Agreed.

-Chris