[swift-server-dev] Prototype of the discussed HTTP API Spec

Johannes Weiss johannesweiss at apple.com
Thu Jun 1 06:10:27 CDT 2017

Hi Michael,

> On 1 Jun 2017, at 1:00 am, Michael Chiu <hatsuneyuji at icloud.com> wrote:
> I think i need to clarify something: I’m ok with a asynchronous api that executes synchronously, for example if the api is something like [[ a. {  b() } ; c() ]], executes as [[ a(); b(); c() ]], it is totally fine since it’s just synchronous api with syntactic sugar.

We actually have a synchronous implementation of the proposed API next to the DispatchIO one that we normally use. The synchronous one uses problem system calls and only services one request per thread. It's handy for unit testing and for specialised use-cases. The synchronous implementation only uses the following syscalls: open, close, read and write, that's it so nothing fancy.

>>>> Not really, what you do when you use kqueue/(e)poll/select is that only said calls are blocking and you set your file descriptors to non-blocking.
>>> Despite kqueue is a blocking, it really only blocks when there’s nothing to do. So semantic-wise, the thread will never block as long as there’s work to do.
>> That is generally true. Also read only blocks if there's nothing to read. However you unfortunately don't know when the other end of the network connection will write something. It might be immediate, it might be minutes, hours or days
>  I’m not quite sure how this is relevant.
>> That is exactly inversion of control. You'll notice that you can't just read when you feel like, you can only read (and write) when kevent tell you to. If you want, do extend your example to make it an echo server, ie. write the bytes that you read. You'll notice that you can't just do what a straightforward synchronous API would look like:
>>    var bytes = ...
>>    let readLengh = read(..., &bytes, maxLen)
>>    write(..., bytes, readLength)
>> you will need to save the bytes and write them when kevent _tells you_ to write them. That is inversion of control.
> You don’t have to, especially for write, you can write read/to a socket anytime you want despite kevent or not, If you don’t want things to block, use the MSG_DONTWAIT flag.
> A working example is here: https://gist.github.com/michael-yuji/f06ac328c3cb052fbe7aaa325486fcf1#file-main-c-L57 . It does contains some kevent logic but no, you don’t have to write when kevent tells you to.
> I’d love to discuss more on kqueue but it’s a bit off topic here.

Well, in your example code, you don't set the file descriptor to non-blocking, so write will just block. Sure, you're free to do that but then we're back at square one, a synchronous and (kernel thread) blocking API. In other words, if you just delete all the kevent stuff, your program will be simpler and have the same effect. If you block write() you might just as well block read() too, nothing lost here and much more straightforward.

If you were to put a `fcntl(new, F_SETFD, O_NONBLOCK);` after the `accept` and you were to check what write() returns, you'd see that it would start returning -1 with errno EAGAIN/EWOULDBLOCK which means that it's not ready to be written. To know when it's ready to be written you will have to ask kevent when you'll be able to write next. That's what I meant when I said kevent will tell you when to write, you can't just write whenever you want. (Technically you can but it'll either block or not write your bytes).

If you don't believe me, do apply this little patch to your program

--- SNIP ---
--- main_orig.c	2017-06-01 11:00:47.000000000 +0100
+++ main.c	2017-06-01 11:18:07.000000000 +0100
@@ -7,6 +7,7 @@
 #include <sys/socket.h>
 #include <sys/types.h>
 #include <sys/event.h>
+#include <time.h>
 #define PORT 8080
 #define min(a, b) ((a > b) ? b : a)
@@ -56,7 +57,12 @@
             /* actual code starts here */
             int readLength = read(ev.ident, buf, min(1024, ev.data));
+            time_t t_start = time(NULL);
             write(ev.ident, buf, min(1024, ev.data));
+            time_t t_end = time(NULL);
+            if (t_end - t_start > 2) {
+                fprintf(stderr, "Blocked for %lds :(.\n", t_end - t_start);
+            }
             bzero(buf, 1024);
@@ -88,4 +94,4 @@
     EV_SET(changelist, fd, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, 0);
     kevent(kq, changelist, 1, NULL, 0, NULL);
     return fd;
\ No newline at end of file
--- SNAP ---

which will now print if write blocked the calling thread for more than 2s.

And then connect it with a program that connects to this socket which writes fast and reads very slowly. Then you'll see your program printing.

One way of emulating this on macOS is this

  yes "$(python -c 'print "x"*100000')" | telnet localhost 8080 | while read line; do sleep 2.5; echo received one line...; done

what this does is to continuously print lines with 100,000 'x' characters and send them to your server. But it'll only read one line of them every 2.5 seconds. That means we have a fast writer and a slow reader. If you now run your program with the patch from above, it will after half a minute or so print lines like:

--- SNIP ---
Blocked for 15s :(.
Blocked for 5s :(.
Blocked for 16s :(.
Blocked for 5s :(.
Blocked for 8s :(.
Blocked for 7s :(.
Blocked for 6s :(.
Blocked for 15s :(.
Blocked for 5s :(.
Blocked for 16s :(.
Blocked for 5s :(.
Blocked for 13s :(.
Blocked for 7s :(.
Blocked for 13s :(.
Blocked for 8s :(.
--- SNAP ---

ie. you use write as a blocking system call because the file descriptor isn't set to be non-blocking.

Just as a side note: You won't be able to repro this issue by replacing the macOS `telnet` with the macOS `nc` (netcat) as netcat will only read more to the socket after it was able to write it. Ie. the implementation of standard macOS `nc` happens to make your implementation appear non-blocking. But the macOS provided telnet seems to do the right thing. You can use pbjnc (http://www.chiark.greenend.org.uk/~peterb/linux/pjbnc/) if you prefer which also doesn't have the same bug as `nc`.

>> I'd guess that most programmers prefer an asynchronous API with callback (akin to Node.js/DispatchIO) to using the eventing mechanism directly and I was therefore assuming you wanted to build that from kevent() (which is what they're often used for). Nevertheless, kevent() won't make your programming model any nicer than asynchronous APIs and as I mentioned before you can build one from the other in a quite straightforward way. What we don't get from that is ordinary synchronous APIs that don't block kernel threads and that happens to be what most people would prefer eventually. Hence libdill/mill/venice and Zewo :).
> Johannes, I totally agree with you. A asynchronous API is more intuitive and I agree with that. But since we are providing low level API for ppl like Zewo, Prefect, and Kitura, it is not right for us to assume their model of programming.
> For libdill/mill/venice, even with green threads they will block when there’s nothing to do,

If you read in libdill/mill/venice, it will switch the user-level thread to _not_ block a kernel thread. That's the difference and that's what we can't achieve with Swift today (without using UB).

> in fact all the example you listed above all uses events api internally. Hence I don’t think if an api will block a kernel thread is a good argument here.

kernel threads are a finite resource and most modern networking APIs try hard to only spawn a finite number of kernel threads way smaller than the number of connections handled concurrently. If you use Dispatch as your concurrency mechanism, your thread pool will have a maximum size of 64 threads by default on Darwin. (Sure you can spawn more using (NS)Thread from Foundation or pthreads or so)

> And even if such totally non-blocking programming model it will be expensive since the kernel is constantly scheduling a do-nothing-thread. (( if the io thread of a server-side application need to do something constantly despite there’s no user and no connection it sounds like a ghost story to me )).  

what is the do-nothing-thread? The IO thread will only be scheduled if there's something to do and then normally the processing starts on that very thread. In systems like Netty they try very hard to reduce hopping between different threads and to only spawn a few. Node.js is the extreme which handles everything on one thread. It will be able to do thousands of connections with only one thread.

> Btw, I just made a working echo server with no event/coroutine api and it’s truly synchronous and non-blocking in C++ just for fun (no it’s not good code).
> https://gist.github.com/michael-yuji/94eda2f8cf3910aa3ff670112728f641

This "working echo server" is indeed synchronous and non-blocking but it just drops all the bytes on the floor that it can't send, don't think that counts as working...

If you don't believe me, try running this against the echo server (do remove the print to stdout in the echo server):

yes "$(python -c 'print "x"*100000')" | cat -n | telnet localhost 8080 | while read line; do echo $line; sleep 2.5; done | sed 's/x//g'

that'll print the same very long lines as above but prepend line numbers, send it through the echo server and read them off very slowly. Then it just drops all "x" characters. We should now expect 1, 2, 3, 4, 5 and so on if we don't lose any lines.

The real output for me though:

--- SNIP ---
$ yes "$(python -c 'print "x"*100000')" | cat -n | telnet localhost 8080 | while read line; do echo $line; sleep 2.5; done | sed 's/x//g'
telnet: connect to address ::1: Connection refused
Trying ::1...
Connected to localhost.
Escape character is '^]'.




--- SNAP ---

ie. we lost thousands of lines in the middle. If you want, also run this dtrace script whilst you're running your echo server:

sudo dtrace -n 'syscall::write:return / pid==$target && arg1 == -1 / { printf("%s returned %d with errno %d", probefunc, arg1, errno); }' -p $(pgrep echo)

it'll print every time that write() returned an error for a program called "echo". That only works if there's only one program called "echo". What we see is a massive stream of

--- SNIP ---
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 35
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
  4    156                     write:return write returned -1 with errno 9
--- SNAP ---

errno 35 is EAGAIN/EWOULDBLOCK, ie. you're dropping bytes. The errno 9 is EBADF which is an illegal file descriptor. I'm sure you can fix the bug with the EBADFs but to fix the EAGAINs you'll need to invert control using kevent().

You should definitely check the return value of write(), it's very important. Even if positive you need to handle the case that it's less bytes than you wanted it to write. And if negative, the bytes are _lost_ which happens all the time with the current implementation.

Anyway, to fix the EAGAIN you'll need to ask kevent() when you can write next.


> ---
> The main difference between (a)sync api is that sync api do only one thing, read/write to an io, async api is doing two things, read/write and scheduling. 
> It is true that we can expose protocol to programmers and let them integrate their own scheduling model, but that will be dead painful to do and introduce unnecessary overheads vs just give them a synchronous API and they can just call it from their code.
> It’s like ordering food over phone from a restaurant, you can either pickup (synchronous) your order or have it deliver to you (asynchronous). Yes most people will prefer delivery, But a pickup option as always more flexible despite you have to wait in the restaurant if the food is not ready. Now if we only expose the asynchronous model, what happen is the main code from the user tell the restaurant where to deliver and what to do when different things happen, which can be a very complex piece of logic.
> On the other hand, with a synchronous (pick up) option available, we can do something like ask Bob to pickup the food in situation A, give Alice the food if B happens. Now with some event notification system, says the restaurant will give you a phone call when the food is ready, it is much easier to integrate your every day schedule, without having a delivery guy constantly drive around a shared highway and hence creating traffic (thread scheduling).   [[[ you can probably tell i just have a really bad experience with a food delivery guy at this moment ]]]
> Generally speaking, we as the server side work group should always give the end user that has a bit more room and freedom, from the API. If they call a blocking call, they probably know what they’re doing.

now you lost me

>> I'm aware of that but you'll suffer from the inversion of control. I'm pretty sure you'll end up with an event loop that calls kevent/epoll all the time and once there's something to do, it'll call the handler registered for that file descriptor (which is what DispatchSources are).
> Yes and at least I can do it.
>> you can do the same in the asynchronous API with back-pressure. I'll quote again from the echo server
>> --- SNIP ---
>>     return .processBody { (chunk, stop) in
>>         switch chunk {
>>             case .chunk(let data, let finishedProcessing):
>>                 res.writeBody(data: data) { _ in
>>                     finishedProcessing()
>>                 }
>>             case .end:
>>                 res.done()
>>             default:
>>                 stop = true /* don't call us anymore */
>>                 res.abort()
>>         }
>>     }
>> --- SNAP ---
>> the call the finishedProcessing() meant that you're ready for more data to be read, ie. the callback to processBody will only be called again after finishedProcessing() was called inside it. That makes sure that the bytes have been written (res.writeBody(...) { ... }) successfully before we accept more.
> As I mentioned the parsing is another story, and your solution also apply to the synchronous version (by no mean I think it’s good. just prove of concept)  I provided. I’ll quote it again.
> parser.processBody =  { (chunk, stop) in
>          switch chunk {
> =====ommit
>          }
>      }
> extension Parser {
> 	func feed(data: AnyCollection<UnsafeBufferPointer>) -> Done
> }
>> We're only talking about the API and not the implementation. You can absolutely implement the API I proposed with only one thread with one large shared buffer for all connections if you want. Running everything on one thread with an async-only API is pretty much exactly what Node.js has as its only implementation and you might find that the APIs look remarkably similar [1].
>> There's nothing stopping you from implementing the proposed API in one thread that services many requests with as much sharing of buffers as you like. Should there be something concrete missing that is making your life hard, please flag it as soon as possible and we can add it. The only implementations I have done of this API are one multithreaded one on top of DispatchIO and one fully synchronous one with one thread per request. Said that, the DispatchIO implementation should be configurable to only use one thread (using a serial dispatch queue instead of a concurrent one for all the event callbacks).
>> [1]: https://nodejs.org/api/http.html
> That only says asynchronous is good and useful and it indeed is, but not a reason to abandon synchronous api. 

Foundation/Cocoa is I guess the Swift standard library and they abandon synchronous&blocking APIs completely. I don't think we should create something different (without it being better) than what people are used to.

Again, there are two options for IO at the moment:
1) synchronous & blocking kernel threads
2) asynchronous/inversion of control & not blocking kernel threads

Even though I would love a synchronous programming model, I'd chose option (2) because the drawbacks of (1) are just too big. The designers of Foundation/Cocoa/Netty/Node.js/many more have made the same decision. Not saying all other options aren't useful but I'd like the API to be implementable with high-performance and not requiring the implementors to block a kernel thread per connection.


More information about the swift-server-dev mailing list