[swift-server-dev] HTTP API v0.1.0

Thu Nov 2 13:37:54 CDT 2017

> On 2. Nov 2017, at 18:31, Johannes Weiß <johannesweiss at apple.com> wrote:
>> I think if we really want async, we need a few API adjustments to make async efficient enough. E.g. maybe pass queues around (probably not a straight DispatchQueue if we don’t want to tie it to GCD, but a context which ensures synchronization - that would be efficient for sync too).
> 
> do you have suggestions how that could look?

Not really. I guess it would be sufficient if the handler gets it, like so:

  func echo(request: .., response: .., queue: …)

Though I was also wondering whether there should be a more general `WOContext` (ha) object which carries more details. Like a logging function to use, or other global (or HTTP transaction local) information.

But maybe that belongs into a higher level (and can be captured to the handler function).

What I would like to avoid is to make `queue:` a `queue: DispatchQueue`, but rather something like a simple

  protocol SyncContext { func sync(_ cb: () -> ()) }

  extension DispatchQueue { 
    func sync(_ cb: () -> ()) { async(execute: cb) }
  }

Synchronous servers would immediately callback to the caller.

> In our internal implementation I have bits of that but never got to the point to actually profiling stuff and I didn't go all the way.

Channels vs source and then doing manual read/write? Well, my basic assumption on this is that even if channels are slower today, they should be made as fast. Conceptually that should work.
I don’t remember what uv does, I think they are more like sources, but I’m not sure.

As mentioned, dispatch source has the little advantage (or not? I’m not convinced it is a good idea) that you can pass in those arbitrary buffer based objects. (And retain that original object).
The real *!?< is that Swift objects do not allow buffer access (as described in my previous mail). That results in a LOT of copying (IMO to a degree I consider it almost fair to say that Swift is simply not applicable for highperf implementations).

I didn’t do _any_ profiling yet. What I wanted to write up is a more realistic test for the implementation scalability. Ie something simple like

  func scaleTestHandler(res) {
    res.write(200)
    setTimeout(0.150 * random-something) { // simulate database/other call
      res.write(“happy?”)
      res.end()
    }
  }

Then use ab to test it. Threaded/sync setups should fail that quickly while async ones presumably will just expose the scalability limits of GCD :->

>> My current async imp has a lot of dispatch overhead because the callbacks can be called from essentially any queue in the current API (affecting every read & write). There are a few models that can be used for doing async:
> 
> yes, we have that too

For me pipelining adds extra sync overhead. (you can pass writes directly to channel.write(), but if you need to spool because the previous request is not done, that takes another queue dispatch …)

>> b) A complete ‘queue free’ model. I’m doing this in my current approach. It is kinda lock free, but has a lot of async dispatching. The base performance overhead is/should-be pretty high, but scalability is kinda like to optimal (theoretically making use of as many CPUs as possible).
> 
> there's indeed quite a few probably better models but I always thought of that as part of the 'networking/streams' track of the server APIs work group. We have a few ideas here, will follow up with that as soon as we can.

In my original Noze imp each stream also had (could have) its own queue, but I really thought that all the queuing will absolutely kill the performance. Don’t know.

> For libdispatch I believe the following model should work very well:
> 
> - create a few number of 'base' queues, probably equal to the number of CPUs stored in an array 'baseQueues'
> - for every request create a new DispatchQueue(label: "...", target: baseQueues[requestNo % baseQueues.count])   (where requestNo is a global atomic (oops) integer of the overall requests)

Sounds good. Note that the `accept`s also should run concurrently (I just put them on a concurrent queue).

> the base queues will end up on different (kernel) threads and the request queues will be round-robin scheduled onto the base queues. That way we make sure we don't randomly spawn new threads which isn't good.

I’m not quite sure what 'randomly spawn new threads’ means. To be honest I expect GCD to do the work you describe above. That is, assign new queues to the optimal number of hosting threads.
The `baseQueues` things (while OK to implement) sounds entirely wrong to me. GCD is responsible for doing such stuff? But maybe I’m missing something.
(the `target` thing is a synchronisation stacking construct, not a threading one, IMO).

> That model obviously only works iff the application code is either non-blocking or dispatches itself off the request queue if it needs to do blocking work. Needless to say we should aim for non-blocking but the reality of today's code in Swift doesn't entirely look like that ;)

With this I don’t see an issue. If the higher level framework wants to work synchronously, it can dispatch its middleware functions and such to a worker queue.
We have back pressure, so the system should be able to deal with congestion.

But maybe this is just another hint that maybe there should be both options/implementations, a sync and a async one. Each has their application and merrits.

(And hey, Noze.io stuff is completely non-blocking, what are you talking about? ;-) )

>> Not sure how well this goes in Linux. Are DispatchQueue’s also cheap on Linux or does the current implementation create a new thread for each?
> 
> the queues themselves are cheap but the Linux implementation AFAIK behaves quite weirdly if it needs to spawn threads. IIRC there's one global thread which every 100ms evaluates if the existing threads are all blocked and if they are, it'll spawn a new thread. Haven't checked the code in a while, maybe someone knows better.

OK.

> That's obviously not a great GCD implementation, the real GCD on macOS has kernel support to make that work much better. The same sadly applies to the eventing mechanism (DispatchSources) which are much more efficient and reduce thread hopping a lot on macOS. 

Oh well.

> But even on Linux I think not having many 'base' queues (which are queues that do not target other queues) should really give the best performance. Needless to say one has to be very careful not to ever block one of these base queues.

Sure.

>> c) Something like a), but with multiple worker queues. Kinda like the Node resolution, but w/o the different processes. This needs an API change, all the callbacks need get passed ‘their’ main queue (because it is not a global anymore).
> 
> Sorry, should've read the whole email before writing above. That sounds pretty much like what I wrote above, right? If you agree that sounds like the best model on GCD to me.

Yes. But unlike a) and b), this requires that the handler gets the queue it is running on, so that it can do:

   func handler(req, res, queue httpQueue:…) {
     bgQueue.async {
       // very very very expensive work, like doing an Animoji
       // done with it,
       httpQueue.async(doneCallback)
     }
   }

If you get the point.

hh