[swift-server-dev] HTTP API v0.1.0

Thu Nov 2 15:23:57 CDT 2017

Hi Helge,

> On 2 Nov 2017, at 11:38 am, Helge Heß via swift-server-dev <swift-server-dev at swift.org> wrote:
> 
> 
> 
>> On 2. Nov 2017, at 18:31, Johannes Weiß <johannesweiss at apple.com> wrote:
>>> I think if we really want async, we need a few API adjustments to make async efficient enough. E.g. maybe pass queues around (probably not a straight DispatchQueue if we don’t want to tie it to GCD, but a context which ensures synchronization - that would be efficient for sync too).
>> 
>> do you have suggestions how that could look?
> 
> Not really. I guess it would be sufficient if the handler gets it, like so:
> 
>  func echo(request: .., response: .., queue: …)
> 
> Though I was also wondering whether there should be a more general `WOContext` (ha) object which carries more details. Like a logging function to use, or other global (or HTTP transaction local) information.
> 
> But maybe that belongs into a higher level (and can be captured to the handler function).
> 
> What I would like to avoid is to make `queue:` a `queue: DispatchQueue`, but rather something like a simple
> 
>  protocol SyncContext { func sync(_ cb: () -> ()) }
> 
>  extension DispatchQueue { 
>    func sync(_ cb: () -> ()) { async(execute: cb) }
>  }
> 
> Synchronous servers would immediately callback to the caller.

interesting. In our internal implementation we have an abstraction which has an API really similar to DispatchIO and two implementations of that. One is synchronous and one is DispatchIO. And at some point I had one which was DispatchSources.

And on these I do in fact have sync/async/notify (for DispatchGroup) methods. So basically the HTTPServer is generic over the IO mechanism it uses. And the IO mechanism has sync/async/notify methods that do the 'right' thing depending on if it's a sync or an async implementation.

>> In our internal implementation I have bits of that but never got to the point to actually profiling stuff and I didn't go all the way.
> 
> Channels vs source and then doing manual read/write? Well, my basic assumption on this is that even if channels are slower today, they should be made as fast. Conceptually that should work.

the only problems with the DispatchIO channels (you mean https://developer.apple.com/documentation/dispatch/dispatchio), right? is that they don't support back pressure directly. I created a layer on top of the IO which adds that on top of it with a _gross_ hack. By using DispatchSources one can implement that quite straightforwardly. What I should've done is in the aforementioned IO abstraction layer create the APIs I want to use (supporting back pressure there) and then implemented it where needed.

> I don’t remember what uv does, I think they are more like sources, but I’m not sure.

yes, DispatchSources are just an eventing mechanism really. Libuv and friends are quite similar there.

> As mentioned, dispatch source has the little advantage (or not? I’m not convinced it is a good idea) that you can pass in those arbitrary buffer based objects. (And retain that original object).

yes, DispatchSources are way more versatile, I just went for DispatchIO because I was lazy ;). You do the read(v)/write(v)/... yourself with DispatchSources so there's a lot that you can do that DispatchIO doesn't.

> The real *!?< is that Swift objects do not allow buffer access (as described in my previous mail). That results in a LOT of copying (IMO to a degree I consider it almost fair to say that Swift is simply not applicable for highperf implementations).
> 
> I didn’t do _any_ profiling yet. What I wanted to write up is a more realistic test for the implementation scalability. Ie something simple like
> 
>  func scaleTestHandler(res) {
>    res.write(200)
>    setTimeout(0.150 * random-something) { // simulate database/other call
>      res.write(“happy?”)
>      res.end()
>    }
>  }
> 
> Then use ab to test it. Threaded/sync setups should fail that quickly while async ones presumably will just expose the scalability limits of GCD :->
> 
> 
>>> My current async imp has a lot of dispatch overhead because the callbacks can be called from essentially any queue in the current API (affecting every read & write). There are a few models that can be used for doing async:
>> 
>> yes, we have that too
> 
> For me pipelining adds extra sync overhead. (you can pass writes directly to channel.write(), but if you need to spool because the previous request is not done, that takes another queue dispatch …)
> 
> 
>>> b) A complete ‘queue free’ model. I’m doing this in my current approach. It is kinda lock free, but has a lot of async dispatching. The base performance overhead is/should-be pretty high, but scalability is kinda like to optimal (theoretically making use of as many CPUs as possible).
>> 
>> there's indeed quite a few probably better models but I always thought of that as part of the 'networking/streams' track of the server APIs work group. We have a few ideas here, will follow up with that as soon as we can.
> 
> In my original Noze imp each stream also had (could have) its own queue, but I really thought that all the queuing will absolutely kill the performance. Don’t know.
> 
> 
>> For libdispatch I believe the following model should work very well:
>> 
>> - create a few number of 'base' queues, probably equal to the number of CPUs stored in an array 'baseQueues'
>> - for every request create a new DispatchQueue(label: "...", target: baseQueues[requestNo % baseQueues.count])   (where requestNo is a global atomic (oops) integer of the overall requests)
> 
> Sounds good. Note that the `accept`s also should run concurrently (I just put them on a concurrent queue).
> 
>> the base queues will end up on different (kernel) threads and the request queues will be round-robin scheduled onto the base queues. That way we make sure we don't randomly spawn new threads which isn't good.
> 
> I’m not quite sure what 'randomly spawn new threads’ means. To be honest I expect GCD to do the work you describe above. That is, assign new queues to the optimal number of hosting threads.

it tries but it really can't do it well and on Linux it's pretty terrible. The problem is that you need application knowledge to decide if it's better to spawn a new thread or not. GCD does (on macOS not Linux) have an upper thread level but it's 64 / 512 depending on your setup by default:

$ sysctl kern.wq_max_threads kern.wq_max_constrained_threads
kern.wq_max_threads: 512
kern.wq_max_constrained_threads: 64

but let's assume you have 4 cores, so for many high-performance networking needs, it'd be useful if GCD never spawned more than 4 threads (or whatever the best value would be for your workload). However that depends on the application. If your application might sometimes block a thread (sure, not great but real world etc) then it will be good if GCD spawned a few more threads. And that's exactly what it does. It tries to spawn new threads when it thinks it would be good for your app. But entirely without actual knowledge what's good for you.

With the base queues you can provide that knowledge to GCD and it will do the right thing. You can totally have 100k queues if they all target a very small number of base queues. You really shouldn't have 100k base queues, especially on Linux.

> The `baseQueues` things (while OK to implement) sounds entirely wrong to me. GCD is responsible for doing such stuff? But maybe I’m missing something.
> (the `target` thing is a synchronisation stacking construct, not a threading one, IMO).

I _think_ (guesswork) that was the idea a long while ago that GCD would just magically find the best number of threads to spawn. Unfortunately without knowledge of your application and what you want to optimise for that's hard. Then adding blocking system calls and potentially very long running loops into the mix makes it even harder. There's no silver bullet to that story so you'll need to tell GCD a little more about your queue hierarchies and then it can do a great job (on a good implementation, ie. Darwin).

In Erlang, Haskell & Go what you want is actually happening. But all those languages use green threads which stops blocking system calls from actually happening and long loops can be broken with yields. But C/ObjC/Swift are different.

>> That model obviously only works iff the application code is either non-blocking or dispatches itself off the request queue if it needs to do blocking work. Needless to say we should aim for non-blocking but the reality of today's code in Swift doesn't entirely look like that ;)
> 
> With this I don’t see an issue. If the higher level framework wants to work synchronously, it can dispatch its middleware functions and such to a worker queue.
> We have back pressure, so the system should be able to deal with congestion.

👍

> But maybe this is just another hint that maybe there should be both options/implementations, a sync and a async one. Each has their application and merrits.
> 
> (And hey, Noze.io stuff is completely non-blocking, what are you talking about? ;-) )
> 
> 
>>> Not sure how well this goes in Linux. Are DispatchQueue’s also cheap on Linux or does the current implementation create a new thread for each?
>> 
>> the queues themselves are cheap but the Linux implementation AFAIK behaves quite weirdly if it needs to spawn threads. IIRC there's one global thread which every 100ms evaluates if the existing threads are all blocked and if they are, it'll spawn a new thread. Haven't checked the code in a while, maybe someone knows better.
> 
> OK.
> 
>> That's obviously not a great GCD implementation, the real GCD on macOS has kernel support to make that work much better. The same sadly applies to the eventing mechanism (DispatchSources) which are much more efficient and reduce thread hopping a lot on macOS. 
> 
> Oh well.
> 
>> But even on Linux I think not having many 'base' queues (which are queues that do not target other queues) should really give the best performance. Needless to say one has to be very careful not to ever block one of these base queues.
> 
> Sure.
> 
> 
>>> c) Something like a), but with multiple worker queues. Kinda like the Node resolution, but w/o the different processes. This needs an API change, all the callbacks need get passed ‘their’ main queue (because it is not a global anymore).
>> 
>> Sorry, should've read the whole email before writing above. That sounds pretty much like what I wrote above, right? If you agree that sounds like the best model on GCD to me.
> 
> Yes. But unlike a) and b), this requires that the handler gets the queue it is running on, so that it can do:
> 
>   func handler(req, res, queue httpQueue:…) {
>     bgQueue.async {
>       // very very very expensive work, like doing an Animoji
>       // done with it,
>       httpQueue.async(doneCallback)
>     }
>   }
> 
> If you get the point.

yes, agreed.

-- Johannes

> 
> hh
> 
> 
> _______________________________________________
> swift-server-dev mailing list
> swift-server-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-server-dev