[swift-server-dev] Prototype of the discussed HTTP API Spec

Thu Jun 1 12:02:32 CDT 2017

Johannes.

> But we're trying to design a HTTP API that is implementable with
reasonable performance and that I believe should be done by only offering
async APIs.

We can definitely do both. We provide a base of concrete types and
protocols like Open Swift to allow other frameworks to build upon it *AND*
have a default official implementation using libdispatch which is Apple's
official library.

On 1 June 2017 at 13:53, Johannes Weiss <johannesweiss at apple.com> wrote:

> Hi Michael,
>
> > On 1 Jun 2017, at 5:08 pm, Michael Chiu <hatsuneyuji at icloud.com> wrote:
> >
> > Hi Johannes
> >
> >>>
> >>> I think i need to clarify something: I’m ok with a asynchronous api
> that executes synchronously, for example if the api is something like [[ a.
> {  b() } ; c() ]], executes as [[ a(); b(); c() ]], it is totally fine
> since it’s just synchronous api with syntactic sugar.
> >>
> >> We actually have a synchronous implementation of the proposed API next
> to the DispatchIO one that we normally use. The synchronous one uses
> problem system calls and only services one request per thread. It's handy
> for unit testing and for specialised use-cases. The synchronous
> implementation only uses the following syscalls: open, close, read and
> write, that's it so nothing fancy.
> >
> > I think even exposing these apis to user will be good. No need for fancy
> support just include it and it will be good enough.
> >
> >>
> >> I think i need to clarify something: I’m ok with a asynchronous api
> that executes synchronously, for example if the api is something like [[ a.
> {  b() } ; c() ]], executes as [[ a(); b(); c() ]], it is totally fine
> since it’s just synchronous api
> >>
> >> ie. you use write as a blocking system call because the file descriptor
> isn't set to be non-blocking.
> >>
> >> Just as a side note: You won't be able to repro this issue by replacing
> the macOS `telnet` with the macOS `nc` (netcat) as netcat will only read
> more to the socket after it was able to write it. Ie. the implementation of
> standard macOS `nc` happens to make your implementation appear
> non-blocking. But the macOS provided telnet seems to do the right thing.
> You can use pbjnc (http://www.chiark.greenend.org.uk/~peterb/linux/pjbnc/)
> if you prefer which also doesn't have the same bug as `nc`.
> >
> > As I said both snippet of code are just sketches only for proof of
> concept. But I do missed on the kevent write one that’s for sure.
> >
> >
> >>
> >>>> I'd guess that most programmers prefer an asynchronous API with
> callback (akin to Node.js/DispatchIO) to using the eventing mechanism
> directly and I was therefore assuming you wanted to build that from
> kevent() (which is what they're often used for). Nevertheless, kevent()
> won't make your programming model any nicer than asynchronous APIs and as I
> mentioned before you can build one from the other in a quite
> straightforward way. What we don't get from that is ordinary synchronous
> APIs that don't block kernel threads and that happens to be what most
> people would prefer eventually. Hence libdill/mill/venice and Zewo :).
> >>>
> >>> Johannes, I totally agree with you. A asynchronous API is more
> intuitive and I agree with that. But since we are providing low level API
> for ppl like Zewo, Prefect, and Kitura, it is not right for us to assume
> their model of programming.
> >>>
> >>> For libdill/mill/venice, even with green threads they will block when
> there’s nothing to do,
> >>
> >> If you read in libdill/mill/venice, it will switch the user-level
> thread to _not_ block a kernel thread. That's the difference and that's
> what we can't achieve with Swift today (without using UB).
> >
> > I’m quite confused on this one, since a green thread, if that’s what we
> think we were referring to, can not enter kernel (It can, but when it
> enters what happened is that the kernel thread associated with enters
> kernel).
> > So you can’t switch to another user-level thread to not block a kernel
> thread.
> > AFAIK all majority OS(Liunx, FreeBSD, Solaris….) adopted 1:1 threading
> model instead of n:m, not sure about Darwin but I think it applies to
> Darwin as well according to an old WWDC video (I could be wrong), hence any
> user threads (except for green threads) are in fact kernel threads. Since
> kevent and epoll are designed to block when they should, I don’t think
> anyone could avoid blocking something.
>
> Yes, Linux, macOS, FreeBSD and so on offer only a 1:1 threading model from
> the OS (Windows I think has Fibers). But libmill/dill/venice implement
> something akin to user-level threads themselves, you don't need kernel
> support for that at all (to schedule them cooperatively). Check out the man
> pages of setjmp and longjmp. Or check out Go (goroutines), Lua
> (coroutines), ... These are all basically cooperatively scheduled threads.
>
> In other words: With setjmp() you can make a snapshot of the current
> environment and with longjmp() you can replace the current environment with
> one that you previously saved. That's like cooperatively switching
> something like a user-level thread/coroutine/green thread.
>
> Run for example the code in this stackoverflow example:
> https://stackoverflow.com/a/14685524
>
> This document explains it pretty well too: http://libdill.org/structured-
> concurrency.html
>
>
> >>> in fact all the example you listed above all uses events api
> internally. Hence I don’t think if an api will block a kernel thread is a
> good argument here.
> >>
> >> kernel threads are a finite resource and most modern networking APIs
> try hard to only spawn a finite number of kernel threads way smaller than
> the number of connections handled concurrently. If you use Dispatch as your
> concurrency mechanism, your thread pool will have a maximum size of 64
> threads by default on Darwin. (Sure you can spawn more using (NS)Thread
> from Foundation or pthreads or so)
> >
> > Yes Kernel threads are finite resources especially in 1:1 model but I’m
> not sure how is it relevant. My concern on not include a synchronous API is
> that it make people impossible to write synchronous code, with server side
> swift tools, despite blocking or not, which they might want to. I’m not
> saying sync is better, I’m just saying we could give them a chance.
>
> No one's taking anything away from you. Everything you have today will
> still be available. But I believe the APIs (which is what we're designing
> here) a web app uses should in today's world in Swift be asynchronous.
>
> Of course to implement the asynchronous API, synchronous system calls will
> be used (eg. kevent/epoll). But the user-facing API that is currently
> proposed is async-only in order for it to be implementable in a performant
> way. If we were to put synchronous functions in the user-facing API, then
> we'll struggle to implement them in a performant way).
>
> Imagine the function to write a HTTP body chunk looked like this:
>
>   func writeBodyChunk(_ data: Data) throws -> Void
>
> then the user can expect this to only return when the data has been
> written successfully and that the connection was dropped if it throws.
> But the implementation now has a problem: What to do if we can't write the
> bytes immediately? The only option we have is block this very thread and
> wait until we have written the bytes. Then we can return and let the user
> know if the write worked or not.
>
> Comparing this to
>
>   func writeBodyChunk(_ data: Data, completion: (Error?) -> Void) -> Void
>
> we can now register the attempt to write the data and move on with the
> calling thread. When the data has been written we invoke the completion
> handler and everything's good.
>
>
>
> >>> And even if such totally non-blocking programming model it will be
> expensive since the kernel is constantly scheduling a do-nothing-thread. ((
> if the io thread of a server-side application need to do something
> constantly despite there’s no user and no connection it sounds like a ghost
> story to me )).
> >>
> >> what is the do-nothing-thread? The IO thread will only be scheduled if
> there's something to do and then normally the processing starts on that
> very thread. In systems like Netty they try very hard to reduce hopping
> between different threads and to only spawn a few. Node.js is the extreme
> which handles everything on one thread. It will be able to do thousands of
> connections with only one thread.
> >>
> >
> > The kernel has no idea is a thread have anything to do unless it
> sleeps/enterKernel, unless a thread fits in these requirements, it will
> always scheduled by the kernel.
>
> but that's exactly what epoll/kevent do. They enter the kernel and tell
> the kernel what the user-space needs next.
>
> The thread is now scheduled only if an event that kevent/epoll are waiting
> for turns up. And when kevent/epoll then return, most of the time the user
> space handler is submitted as a callback.
>
>
> > I’m saying, if there exists a real non-blocking programming model,
> defined that by “never call any ‘wait’ system calls’, than any IO threads
> of that model must constantly poll the kernel, hence such thread
> _cannot_be_scheduled_on_demand since the thread itself has no idea if it
> has anything to do. The only way to have an IO thread to do know they have
> to do something, they will either need
> >
> > 1) An external listener call blocking event api and poke the IO thread
> on demand
> > 2) The IO thread has to constantly poll the kernel
> > 3) An external listener polls the kernel constantly and poke the IO
> thread when ready.
> >
> > 2 and 3 are the do-nothing-thread I’m referring to, they are running,
> polling, wasting kernel resources but not actually being productive (when
> there’s no connection).
>
> Nothing is in a tight polling loop. We run handlers as long as we can and
> when all handlers have run, kevent/epoll is entered again, done. Often you
> just spawn as many threads as you have CPUs and all is good. These threads
> are mostly sitting in kevent/epoll and as soon as some file descriptor
> becomes readable/writable the respective handler is invoked.
>
> That's what DispatchSources do, what Node.js does, what Netty does, ...
>
>
> >> You should definitely check the return value of write(), it's very
> important. Even if positive you need to handle the case that it's less
> bytes than you wanted it to write. And if negative, the bytes are _lost_
> which happens all the time with the current implementation.
> >>
> >> Anyway, to fix the EAGAIN you'll need to ask kevent() when you can
> write next.
> >
> > It was suppose to be a proof of concept sketch work. As mentioned in the
> comments of the code it was assuming to satisfy one single client. Now I’ve
> improved it so it handles multiple clients while remain synchronous and non
> blocking. EAGAIN is the only “error” will raise if you consider it as error
> but for me it’s part of the non blocking IO.
>
> well, it means the write hasn't happened. You will need to do the write
> again and that is normally done when kevent/epoll tell you to. And that's
> what I mean by inversion of control.
>
>
> >> Foundation/Cocoa is I guess the Swift standard library and they abandon
> synchronous&blocking APIs completely. I don't think we should create
> something different (without it being better) than what people are used to.
> >>
> >> Again, there are two options for IO at the moment:
> >> 1) synchronous & blocking kernel threads
> >> 2) asynchronous/inversion of control & not blocking kernel threads
> >>
> >> Even though I would love a synchronous programming model, I'd chose
> option (2) because the drawbacks of (1) are just too big. The designers of
> Foundation/Cocoa/Netty/Node.js/many more have made the same decision. Not
> saying all other options aren't useful but I'd like the API to be
> implementable with high-performance and not requiring the implementors to
> block a kernel thread per connection.
> >
> > To be honest I will choose 2 as well. But we are in not a 2 choose 1
> situation. The main difference between we and netty/node.js is that ppl use
> them to, write a server, what we do is, writing something ppl use to write
> something like netty and node.js. So it is reasonable to think there’s
> demand on a lower-level, synchronous api, despite the possible “drawbacks”
> they might encounter.
>
> This is the HTTP group so people will only write web servers with it, the
> API that was proposed it definitely not meant to implement anything netty
> or node like. It's to implement web apps in Swift.
>
> There is however also a Networking/Transport group which will be more
> low-level than this (I assume) and there we do need to consider the
> lower-level APIs. And those will contain blocking system calls, namely
> kevent/epoll (if it won't be based on top of DispatchSources/DispatchIO
> which do the eventing out of the box, obviously also implemented with
> kevent/epoll).
>
>
> > Maybe we have some misunderstanding here. I’m not saying a synchronous
> api that happens to be able to handle a vector of sockets in single call
> without blocking anything, I’m saying a synchronous api that can just do
> one simple thing, which is, read/write in a synchronous way despite block
> or not, if it will block, just let them know by throwing an exception, the
> api call itself, will not block anything that way.
>
> there may well be a misunderstanding here. No one wants to take all
> synchronous APIs away from you. They are available in Swift today and will
> remain there tomorrow.
>
> But we're trying to design a HTTP API that is implementable with
> reasonable performance and that I believe should be done by only offering
> async APIs.
>
>
> Thanks,
>   Johannes
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-server-dev/attachments/20170601/a6c1f9e3/attachment.html>