[swift-server-dev] Prototype of the discussed HTTP API Spec

Wed May 31 08:20:19 CDT 2017

Hi Michael,

> On 30 May 2017, at 9:03 pm, Michael Chiu <hatsuneyuji at icloud.com> wrote:
> 
> Hi Johannes,
> 
> By synchronous/asynchronous I am referring to the natural of the API, something like
> 
> ```sync
> func read(fd:, to buffer:) throws 
> ```
> ```async
> func read(fd:, handiler: (buffer)->R) throws -> R 
> ```

yes, we do agree here :)

>> Not really, what you do when you use kqueue/(e)poll/select is that only said calls are blocking and you set your file descriptors to non-blocking.
> 
> Despite kqueue is a blocking, it really only blocks when there’s nothing to do. So semantic-wise, the thread will never block as long as there’s work to do.

That is generally true. Also read only blocks if there's nothing to read. However you unfortunately don't know when the other end of the network connection will write something. It might be immediate, it might be minutes, hours or days.

>> if you use kqueue/(e)poll/select you get inversion of control, ie. you block only one (or a few) threads in kqueue/epoll/select and then invoke code to handle the event (which can be 'file descriptor XYZ can be read/written').
>> 
>> So the echo server becomes more like this (this is extreme pseudo code and ignores most of the real challenges)
>> 
>> func echoServer(socket: Socket) {
>>    socket.whenReadable { bytes in
>>        socket.whenWritable {
>>            socket.write(bytes)
>>        }
>>    }
>> }
>> 
> 
> That’s not quite true, the gold of kqueue and event notification apis is that you can pick when you want to read from the socket (in fact you don’t necessarily have to set then non-block), which is in fact more like coroutine.
> 
> This also say that there’s read available, they know exactly which thread will execute on and the sequence of execution, so no external mechanism required to synchronize resources.
> 
> while im_feeling_lucky {
>   if (feeling_good) {
>     kevent(…)
>       for ev in events_i_can_do {
>         if (happy(ev)) {
>           read(ev,….)
>         }
>       }
>     }
> }

That is exactly inversion of control. You'll notice that you can't just read when you feel like, you can only read (and write) when kevent tell you to. If you want, do extend your example to make it an echo server, ie. write the bytes that you read. You'll notice that you can't just do what a straightforward synchronous API would look like:

    var bytes = ...
    let readLengh = read(..., &bytes, maxLen)
    write(..., bytes, readLength)

you will need to save the bytes and write them when kevent _tells you_ to write them. That is inversion of control.

I appreciate that inversion of control doesn't always mean that will have completion/continuation callbacks but that is what's typically used in languages supporting that as that makes the programming model easier. In other words: Eventing mechanisms (implying inversion of control) and asynchronous APIs with completion callbacks can be expressed as one another in a quite straightforward way. There DispatchSources for example are just a wrapper over kqueue/epoll.
Typically, programmers prefer these asynchronous APIs over using the eventing mechanism directly in languages that support it and hence that's what's used in Dispatch(IO/Sources), Node.js and many others.

I guess we're going around in circles discussing whether inversion of control and asynchronous APIs with completion callbacks are the same. They aren't, but they're easy enough to convert to one another that they're sometimes mentioned together.

I'd guess that most programmers prefer an asynchronous API with callback (akin to Node.js/DispatchIO) to using the eventing mechanism directly and I was therefore assuming you wanted to build that from kevent() (which is what they're often used for). Nevertheless, kevent() won't make your programming model any nicer than asynchronous APIs and as I mentioned before you can build one from the other in a quite straightforward way. What we don't get from that is ordinary synchronous APIs that don't block kernel threads and that happens to be what most people would prefer eventually. Hence libdill/mill/venice and Zewo :).

> In fact in kqueue you can even temporarily disable events, or simply not call kevent when the server is under pressure. 
> 
> With a Synchronous IO API, the user have 100% control on when read/write occur, on which thread it occur, how many partial bytes to read and synchronize shared resources without lock etc.
> 
> If we only provide a asynchronous API, some existing server-side framework, say Perfect, very hard to integrate with the official one.

I just looked into PerfectNet which I believe is Perfect's implementation of IO. Just as what I was describing, they use epoll/kqueue to build an eventing mechanism with an asynchronous API w/ callbacks. It looks reasonably similar to DispatchIO and Sources. I don't know why PerfectNet isn't built on top of them, that would make the code simpler and more portable.

Here a snippet of code from PerfectNet tests:

--- SNIP ---
            try server.accept(timeoutSeconds: NetEvent.noTimeout) {
                (inn: NetTCP?) -> () in
                guard let n = inn else {
                    XCTAssertNotNil(inn)
                    return
                }
                let b = [UInt8(1)]
                do {
                    n.write(bytes: b) {
                        sent in
                        XCTAssertTrue(sent == 1)
                        n.readBytesFully(count: 1, timeoutSeconds: 5.0) {
                            read in
                            XCTAssert(read != nil)
                            XCTAssert(read?.count == 1)
                        }
                        serverExpectation.fulfill()
                    }
                }
            }
--- SNAP ---

you'll recognise the asynchronous read/write APIs.

>> If you want a more realistic example check out DispatchSources and DispatchIO from Dispatch. The important bit is that if you use these eventing solutions, you'll get inversion of control and that's commonly referred to as an asynchronous API as you can't do anything in the current thread but have to do it when the eventing library tells you to. Cf. also the HTTP API that I was initially proposing and a HTTP echo server for that:
> 
> There’s timeout option on both api, and usually these api only block when there’s nothing to do.
> You don’t have to handle those event if you don’t want to as well, you can always not to set EPOLLET and not handle the event, which is totally fine, and handle it next time you call kevent.
> 
>> --- SNIP ---
>> serve { (req, res) in
>>  if req.target == "/echo" {
>>      guard req.httpVersion == (1, 1) else {
>>          /* HTTP/1.0 doesn't support chunked encoding */
>>          res.writeResponse(HTTPResponse(version: req.version,
>>                                         status: .httpVersionNotSupported,
>>                                         transferEncoding: .identity(contentLength: 0)))
>>          res.done()
>>          return .discardBody
>>      }
>>      res.writeResponse(HTTPResponse(version: req.version,
>>                                     status: .ok,
>>                                     transferEncoding: .chunked,
>>                                     headers: SomeConcreteHTTPHeaders([("X-foo": "bar")])))
>>      return .processBody { (chunk, stop) in
>>          switch chunk {
>>              case .chunk(let data, let finishedProcessing):
>>                  res.writeBody(data: data) { _ in
>>                      finishedProcessing()
>>                  }
>>              case .end:
>>                  res.done()
>>              default:
>>                  stop = true /* don't call us anymore */
>>                  res.abort()
>>          }
>>      }
>>  } else { ... }
>> }
>> --- SNAP ---
>> 
>> You'll see that we return a closure to process the individual chunks of an HTTP body (`return .processBody { ... }`) and register a write of the response when that closure got invoked by the eventing library.
>> 
> 
> Parsing is another story, tho the synchronous api can be something like
> 
> parser.processBody =  { (chunk, stop) in
>          switch chunk {
>              case .chunk(let data, let finishedProcessing):
>                  res.writeBody(data: data) { _ in
>                      finishedProcessing()
>                  }
>              case .end:
>                  res.done()
>              default:
>                  stop = true /* don't call us anymore */
>                  res.abort()
>          }
>      }
> 
> typealias Done = Bool
> extension Parser {
> func feed(data: AnyCollection<UnsafeBufferPointer>) -> Done
> }
> 
>> can you expand on this? What you get with kqueue/epoll is an asynchronous API so I reckon there's just some misunderstanding in terminology.
> 
> As mentioned as above, one can choose which thread, when to read, where is the buffer and how to synchronous resources.

I'm aware of that but you'll suffer from the inversion of control. I'm pretty sure you'll end up with an event loop that calls kevent/epoll all the time and once there's something to do, it'll call the handler registered for that file descriptor (which is what DispatchSources are).

> Assuming I’m somehow writing a server that somehow have some strange requirement,
> since the asynchronous approach is that the event library calling preset handler whenever payload arrives, which with a mix with kqueue I can choose not to handle any (by simply not to call kevent) and call (kevent) from my code when I’m ready.

you can do the same in the asynchronous API with back-pressure. I'll quote again from the echo server

--- SNIP ---
     return .processBody { (chunk, stop) in
         switch chunk {
             case .chunk(let data, let finishedProcessing):
                 res.writeBody(data: data) { _ in
                     finishedProcessing()
                 }
             case .end:
                 res.done()
             default:
                 stop = true /* don't call us anymore */
                 res.abort()
         }
     }
--- SNAP ---

the call the finishedProcessing() meant that you're ready for more data to be read, ie. the callback to processBody will only be called again after finishedProcessing() was called inside it. That makes sure that the bytes have been written (res.writeBody(...) { ... }) successfully before we accept more.

> It also let me handle the req one by one without locking resources for synchronization and I can even allocate a single buffer in the stack for all connections.

You can achieve the same with the asynchronous API if you can control on what thread/queue the event handler get invoked. See DispatchIO where you specify the queue on which you want to be notified. Our implementation of the proposed API always calls the body handler on the same serial dispatch queue, you can use all your data structures in a straightforward way without locking.

Regarding the single buffer on the stack: Even that is possible if you write your own eventing library. You don't need to invoke the asynchronous callbacks on a different thread. You can just invoke them on the event loop thread and use a global buffer.

> None of these can easily done if only asynchronous API is provided. (People can always fall back to C API, but it doesn’t help server side swift much).

We're only talking about the API and not the implementation. You can absolutely implement the API I proposed with only one thread with one large shared buffer for all connections if you want. Running everything on one thread with an async-only API is pretty much exactly what Node.js has as its only implementation and you might find that the APIs look remarkably similar [1].

There's nothing stopping you from implementing the proposed API in one thread that services many requests with as much sharing of buffers as you like. Should there be something concrete missing that is making your life hard, please flag it as soon as possible and we can add it. The only implementations I have done of this API are one multithreaded one on top of DispatchIO and one fully synchronous one with one thread per request. Said that, the DispatchIO implementation should be configurable to only use one thread (using a serial dispatch queue instead of a concurrent one for all the event callbacks).

[1]: https://nodejs.org/api/http.html

--
  Johannes