[swift-server-dev] HTTP API v0.1.0

Thu Nov 2 12:31:39 CDT 2017

Hi Helge,

> On 1 Nov 2017, at 6:13 pm, Helge Heß via swift-server-dev <swift-server-dev at swift.org> wrote:
> 
> On Nov 2, 2017, at 1:21 AM, Johannes Weiß <johannesweiss at apple.com> wrote:
>> that's really cool, thanks for putting that together and sharing it! And that sounds great about the GCD implementation, by the way based on DispatchSources or DispatchIO?
> 
> Channels. (source for listener of course) I mean, it doesn’t really matter that much. With the current API sources may be more efficient as you may be able to avoid copying to DispatchData objects.
> 
> 
> I think if we really want async, we need a few API adjustments to make async efficient enough. E.g. maybe pass queues around (probably not a straight DispatchQueue if we don’t want to tie it to GCD, but a context which ensures synchronization - that would be efficient for sync too).

do you have suggestions how that could look? In our internal implementation I have bits of that but never got to the point to actually profiling stuff and I didn't go all the way.

> My current async imp has a lot of dispatch overhead because the callbacks can be called from essentially any queue in the current API (affecting every read & write). There are a few models that can be used for doing async:

yes, we have that too

> a) the single ‘worker queue’ model done by Node.js / Noze.io. In Swift this is not as bad as on Node.js because you can dispatch work on a different queue and then back to the single ‘main queue’ which ensures the IO stack synchronization (like shown in here http://noze.io/noze4nonnode/). When calling done callbacks, you dispatch back to a single queue.
> 
> This can congest because of the global main queue. In Node they deal w/ this by forking servers. Which is kinda fine and lame at the same time.
> 
> 
> b) A complete ‘queue free’ model. I’m doing this in my current approach. It is kinda lock free, but has a lot of async dispatching. The base performance overhead is/should-be pretty high, but scalability is kinda like to optimal (theoretically making use of as many CPUs as possible).

there's indeed quite a few probably better models but I always thought of that as part of the 'networking/streams' track of the server APIs work group. We have a few ideas here, will follow up with that as soon as we can.

For libdispatch I believe the following model should work very well:

- create a few number of 'base' queues, probably equal to the number of CPUs stored in an array 'baseQueues'
- for every request create a new DispatchQueue(label: "...", target: baseQueues[requestNo % baseQueues.count])   (where requestNo is a global atomic (oops) integer of the overall requests)

the base queues will end up on different (kernel) threads and the request queues will be round-robin scheduled onto the base queues. That way we make sure we don't randomly spawn new threads which isn't good.

That model obviously only works iff the application code is either non-blocking or dispatches itself off the request queue if it needs to do blocking work. Needless to say we should aim for non-blocking but the reality of today's code in Swift doesn't entirely look like that ;)

> Not sure how well this goes in Linux. Are DispatchQueue’s also cheap on Linux or does the current implementation create a new thread for each?

the queues themselves are cheap but the Linux implementation AFAIK behaves quite weirdly if it needs to spawn threads. IIRC there's one global thread which every 100ms evaluates if the existing threads are all blocked and if they are, it'll spawn a new thread. Haven't checked the code in a while, maybe someone knows better.

That's obviously not a great GCD implementation, the real GCD on macOS has kernel support to make that work much better. The same sadly applies to the eventing mechanism (DispatchSources) which are much more efficient and reduce thread hopping a lot on macOS. 

But even on Linux I think not having many 'base' queues (which are queues that do not target other queues) should really give the best performance. Needless to say one has to be very careful not to ever block one of these base queues.

> c) Something like a), but with multiple worker queues. Kinda like the Node resolution, but w/o the different processes. This needs an API change, all the callbacks need get passed ‘their’ main queue (because it is not a global anymore).

Sorry, should've read the whole email before writing above. That sounds pretty much like what I wrote above, right? If you agree that sounds like the best model on GCD to me.

> I don’t know. a) is the easiest and maybe good enough. Scalability should be way better than the threaded sync setup.
> My more complex b) version can be easily changed to a).
> 
> 
> While working on this I’m kinda wondering whether we should indeed have multiple implementations that can be used for the specific purposes. E.g. for performance benchmarks (which people seem to like) you would want to have the sync implementation for best raw throughput. For a scalability benchmark a) is probably best for regular setups, and maybe b) for 256 CPU servers.
> c) is a sweet spot, but complicates the API (though I think UIKit etc also have the model of passing queues alongside delegates/callbacks).

agreed. For best performance we probably need to do that. In the internal implementation you get your request queue with the request.

> Oh, and we have those:
> 
>  func doSomething(onDone cb : ()->())
> 
> This would be way better
> 
>  func doSomething(onDone cb : (()->())?)
> 
> for the case where cb is nil (no empty block needs to be synchronized).
> 
> 
> Maybe I manage to finish it up on the weekend, not sure yet.

👍

-- Johannes

> 
> hh
> 
> _______________________________________________
> swift-server-dev mailing list
> swift-server-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-server-dev