[swift-server-dev] Thoughts on the current HTTP API

Fri Oct 13 16:03:39 CDT 2017

On 13. Oct 2017, at 16:34, Johannes Weiß <johannesweiss at apple.com> wrote:
>> On 13. Oct 2017, at 09:34, Johannes Weiß via swift-server-dev <swift-server-dev at swift.org> wrote:
>>> I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed 😉):
>>> 
>>> - why do we store `original` here? [2]
>>> 
>>>  public struct HTTPHeaders {
>>>      var original: [(name: Name, value: String)]?
>>>      var storage: [Name: [String]] {
>>>          didSet { original = nil }
>>>      }
>> 
>> I guess I don’t care too much about this, but I can see two reasons:
>> a) For proxy like applications, for debugging and other stuff,
>>  it is nice to preserve the original
>> b) A dictionary-of-arrays has a higher overhead than a plain array for
>>  the HTTP header level of complexity.
>> 
>> Probably makes sense to have just one of the two,
>> and I would suggest the first.
> 
> that's true. However we'll pay twice the ARC overhead (ie. passing `HTTPHeaders` around will be two retains (for the underlying storages of the array/dict)).

… well, if I read the code you quoted above right, only one of the two is being used, which also answers your other question:

> Why is original `Optional` btw?

You posted it:

    var storage: [Name: [String]] { didSet { original = nil } }

I guess all this doesn’t make much sense to me. I would just store a single array.

>>> - why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument
>> I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
>> (as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)
> makes sense! Should we have two methods with clear names and doc?

My opinion is yes, but the code ended up with ‘no'. ¯\_(ツ)_/¯

>>> - `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?
>> 
>> This was suggested by someone on GitHub I think. The basic idea is to avoid copying. Say you have a template and you want to get to an IO buffer w/ minimal copying. You wouldn’t want to copy the buffer of the template objects to a Data, only to pass that over to the API.
>> 
>> My opinion is that this really does not belong here. Also: it is poorly implemented. If this is done for performance, it absolutely has to support iovec’s, not just a plain array.
>> Personally I would just go w/ DispatchData, it is made for this and can preserve discontinuous buffers (iovecs …). (What DispatchData is lacking is getting an iovec out, but maybe we could get such an API?).
> 
> agreed, we have DispatchData and the collection (both methods). Sometimes you can implement a collection without a buffer and produce the actual bytes on the fly (like String.utf8 with UTF8View).

The important point is that the thing you need in the end is always a buffer, or better iovec. ‘Producing bytes on the fly’ is what you usually want to avoid as much as possible, IMO.
Instead you want direct and stable access to storage buffers backing the objects. (Which Swift unfortunately does not provide in the stdlib.)

> But there are cases when you already actually have a straight up memory buffer. DispatchData then serves those needs.

This is not the primary intention. I think the intention is that the conforming objects produce their buffer representation directly into the target buffer (which will be ‘send’).
Instead of your `Producing bytes on the fly`, a `produce a C buffer on the fly`. (I guess mmap’ing a file could be one example).

Don’t know, I personally think it is overkill and I’m good w/ DispatchData. With the exception that I’d like to have API to get an iovec out of a DispatchData stack (instead of just the enumeration method we have now).

BTW: Something which is missing for even a simple web server is some support for `sendfile` to serve static resources.

>>> which would get rid of the existential and thus make it faster. But I think this should really
>>> 
>>> `func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`
>> 
>> I think it may be a good choice for this effort. A problem w/ that is getting back to the buffer, which is what you need for I/O. Collection has no methods for that, right? (withUnsafeBuffer etc is higher level, right?).
> 
> indeed. The collection doesn't necessarily exist in (contiguous) memory (yet).

Of course not, but it should have API to produce a contiguous C buffer. If a particular collection *is* backed by contiguous memory, right away, otherwise by generating it.

Or as I said above, even better a `var unsafeBuffer : UnsafeBufferPointer<Element>?`, which would be nil if the collection doesn’t have a stable buffer, and otherwise provide direct access to the buffer (which then could be passed into DispatchData w/ an associated destructor).

But all this is not relevant for this effort :-) W/ the current Swift setup it can’t provide the desired zero-copy behaviours we had with ObjC. ¯\_(ツ)_/¯

So what would

  `func writeBody<Bytes: Collection>(_ data: Bytes, …)`

do? Presumably a malloc(), memcpy(from: data). IMO not great and sub-standard in 2017, but can’t be done much better in Swift v0.4, I guess. (Especially if protocol calls imply a malloc, how crazy is that :palmface:)

>> I’m not entirely sure why you think about buffering here, this seems out of scope and the responsibility of a streaming library on top.
...
> But if the socket buffers are empty, if you hand it to the async library, it'll issue
> 
>  write("GET / ...")
>  write("Hello World!")
> 
> in two syscalls. It can be preferable to have
> 
>  writev(iovec { "GET / ... " }, iovec { "Hello World “})

Ah, OK, yes. You want to bundle small bodies with the HTTP header. I agree, that can make some sense.

But introducing explicit buffering for just that case, don’t know. Maybe. I guess +0.5.

> Does that make some sense?

Yes.

>> I guess keeping them in one is OK. PoC is just a step until the ‘real’ implementation is ready.
>> (and it is living in separate source dir IIRC)
> 
> right, what's the value of PoC in the actual repo then?

Have a working demo implementation? To test the API?
Why is it synchronous? I suppose because it is based on the Kitura stuff? ¯\_(ツ)_/¯

> not happy with Swift's String performance? Or using Strings in places where a byte array could be used?

We discussed that at length before. I have a set of reasons for that.

Yes, I think throwing a Unicode beast at something which is by definition Latin-1 is just wrong (for performance oriented stuff). Just think what that means wrt to comparing “Content-Length” w/ “content-length” … Even if there are short paths, availability of them would need to be checked first, etc.

What I would do? I have two possible designs in mind, one is zero-copying the input buffers and peek directly into the them for lookup etc. The other one is directly parsing the headers into a HTTP/2 style reuse-table and assign integer ids (at the http_parser.c level). Likely also parsing the values of knows headers (kinda like `enum HTTPHeader { case contentLength(Int), userAgent(String), …`).

Again, I don’t think it matters for this effort. Having API compatibility w/ Swift stdlib/Foundation is more important than actual efficiency. (… a malloc for protocol func calls??? - just too hard to believe ;-)

hh