[swift-server-dev] Thoughts on the current HTTP API

Johannes Weiß johannesweiss at apple.com
Fri Oct 13 09:34:39 CDT 2017


Hi,

> On 13 Oct 2017, at 11:39 am, Helge Heß via swift-server-dev <swift-server-dev at swift.org> wrote:
> 
> Hey,
> 
> On 13. Oct 2017, at 09:34, Johannes Weiß via swift-server-dev <swift-server-dev at swift.org> wrote:
>> I just had a brief look at the HTTP APIs [1]. Some thoughts (I realise some are different to what I initially proposed 😉):
>> 
>> - why do we store `original` here? [2]
>> 
>>   public struct HTTPHeaders {
>>       var original: [(name: Name, value: String)]?
>>       var storage: [Name: [String]] {
>>           didSet { original = nil }
>>       }
> 
> I guess I don’t care too much about this, but I can see two reasons:
> a) For proxy like applications, for debugging and other stuff,
>   it is nice to preserve the original
> b) A dictionary-of-arrays has a higher overhead than a plain array for
>   the HTTP header level of complexity.
> 
> Probably makes sense to have just one of the two,
> and I would suggest the first.

that's true. However we'll pay twice the ARC overhead (ie. passing `HTTPHeaders` around will be two retains (for the underlying storages of the array/dict)). Of course we could box the storage internally and implement CoW manually this way

public struct HTTPHeaders {
  private var storage = _HTTPHeaderStorage /* with CoW implemented */
  private class _HTTPHeaderStorage {
    var original: ...
    var storage: ...
  }

  ... mutating func ... {
      if !isKnownUniquelyReferenced(&self.storage) {
          self.storage = self.storage.copy()
      }
      /* perform modification */
  }
}

but actually, ignore me, we can always retrofit this if it turns out to be a perf issue. Lets go with original and storage.

Why is original `Optional` btw?


> 
> 
>> - I don't think we should name the method to write the HTTP response head `writeHeader`, that's too confusing [3], it should be `writeResponseHead` maybe?
> 
> +1
> 
> 
>> - why do we have a `struct HTTPResponse` [4] and then we don't use it to write the response? I think the `writeResponseHead` (currently named `writeHeader`) should take a `HTTPResponseHead` at its sole argument
> 
> I think the idea is that you use this method for both, writeResponseHead and for 100-continue writes.
> (as mentioned before I’m very much against coupling the two, those should stay distinct methods IMO)

makes sense! Should we have two methods with clear names and doc?


>> - I think we should rename `HTTPRequest` and `HTTPResponse` to `HTTPRequestHead` and `HTTPResponseHead` as they're not full requests/responses but just the head (body is not included)
> 
> +1 (as this was my proposal from the beginning [the discussion around this should not be restarted, it is in the archives])
> 
> 
> 
>> - `func writeBody(_ data: UnsafeHTTPResponseBody, ...) ` [5]: where did this come from?
> 
> This was suggested by someone on GitHub I think. The basic idea is to avoid copying. Say you have a template and you want to get to an IO buffer w/ minimal copying. You wouldn’t want to copy the buffer of the template objects to a Data, only to pass that over to the API.
> 
> My opinion is that this really does not belong here. Also: it is poorly implemented. If this is done for performance, it absolutely has to support iovec’s, not just a plain array.
> Personally I would just go w/ DispatchData, it is made for this and can preserve discontinuous buffers (iovecs …). (What DispatchData is lacking is getting an iovec out, but maybe we could get such an API?).

agreed, we have DispatchData and the collection (both methods). Sometimes you can implement a collection without a buffer and produce the actual bytes on the fly (like String.utf8 with UTF8View). But there are cases when you already actually have a straight up memory buffer. DispatchData then serves those needs.


>> `UnsafeHTTPResponseBody` is an existential (a protocol-typed variable). That means with this type signature, you'll do an allocation for every call, that's pretty bad. A better option would be
>>  `func writeBody<Body: UnsafeHTTPResponseBody>(_ data: Body, …)`
> 
> That sounds good to me.
> 
> 
>> which would get rid of the existential and thus make it faster. But I think this should really
>> 
>>  `func writeBody<Bytes: Collection>(_ data: Bytes, ...) where Collection.Element == UInt8`
> 
> I think it may be a good choice for this effort. A problem w/ that is getting back to the buffer, which is what you need for I/O. Collection has no methods for that, right? (withUnsafeBuffer etc is higher level, right?).

indeed. The collection doesn't necessarily exist in (contiguous) memory (yet).


> And if you do this, you would probably do a `if let data as? DispatchData {}` in the implementation, right?
> 
> 
>> - we should decide and document the flushing behaviour. Probably it's good enough if we flush after each `write{ReponseHead, Trailer, Body}(...)` call. Some applications would get more performance if there'd be an explicit `flush()` call but then we need to make that a requirement which would make the API even less appealing to use directly. What do people think?
> 
> I’m not entirely sure why you think about buffering here, this seems out of scope and the responsibility of a streaming library on top.
> In an async implementation I would assume that every write is immediately handed off to the async I/O library (e.g. DispatchChannel.write) and no HTTP-level buffering is involved (the async-IO would be responsible if it would want to do anything extra).
> Maybe you could elaborate a little more how you see a `flush` being used.

well, let's say you write an HTTP server that produces its response on the fly as a whole:

GET / HTTP/1.1
Some: headers
Content-Length: 12

Hello World!

Ideally, you'd want that to be a single write(v). But

- writeResponseHead(...) produces

GET / HTTP/1.1
Some: headers
Content-Length: 12

- write Body produces

Hello World!

But if the socket buffers are empty, if you hand it to the async library, it'll issue

  write("GET / ...")
  write("Hello World!")

in two syscalls. It can be preferable to have

  writev(iovec { "GET / ... " }, iovec { "Hello World "})

but it can only really do that if you issue it as two commands to the async library:

  asyncWrite("GET / ...")
  asyncWrite("Hello World!")
  flush()

In other words: I think the async library has basically two choices (the socket is writable):

  1) try handing the kernel the data as soon as it arrives one by one
  2) buffer the data until it gets told to flush and then write everything that's buffered in one writev()

I don't see any performant solution that's neither (1) nor (2) at the moment.

Without an explicit flush() call, the only thing we can implement is (1) which is probably fine.

Does that make some sense?


>> - regarding the PoC implementation: we really need an asynchronous PoC implementation and see if our APIs make sense. I don't quite see the value of the synchronous PoC. I'd recommend to do that on top of `DispatchIO` (without backpressure quite straightforward) or `DispatchSources` (bit more work).
> 
> I agree, and there is plenty of FOSS code available which does an async socket, I don’t even see the use in doing an own from-scratch PoC socket library.
> One issue w/ async libs is, that they are harder to integrate w/ TLS libs. But there are solutions for this (I think we found an x-platform one for Noze.io).
> 
> For async sockets I have two things to offer, and people can happily steal code from it for this effort, or just look how that works:
> a) https://github.com/AlwaysRightInstitute/SwiftSockets
> b) Noze.io: https://github.com/NozeIO/Noze.io/blob/master/Sources/fs/GCDChannelBase.swift
> 
> The first uses non-blocking GCD I/O, but it doesn’t use GCD channels (just DispatchSources, the I/O is then done using regular Posix stuff).
> 
> The second uses Dispatch channels. Which I think is what you mean by `DispatchIO`. I don’t see issues w/ back-pressures here. It calls a closure when it is done, and that can be passed up.
> 
> In general I kinda question the viability of GCD for real server workloads, but IMO it should be fine and the choice for this effort.
> 
> 
>> - given Swift 4's new String APIs, we should not depend on `Foundation` I believe. There's no need of going through `Data` anymore as `String` now has a new constructor: `String(decoding: bytes, as: UTF8.self)`:
>> 
>>   let bytes = "some string”.utf8
> 
> Nice, sounds good to me.
> 
> 
>> - I think it'd make sense to clearly split (two different top-level directories) what's the API we propose and what's the PoC implementation. Maybe even two different repositories?
> 
> I guess keeping them in one is OK. PoC is just a step until the ‘real’ implementation is ready.
> (and it is living in separate source dir IIRC)

right, what's the value of PoC in the actual repo then?


>> PS: Please let me know if you want me to expand on the overhead of existential types in APIs. It seems like that's often missed in Swift and often it probably doesn't matter all that much. But in APIs that are meant for performant stuff it does.
> 
> Well, I don’t think the latter is really a goal. This is using Strings and stuff. But still a very good advise!

not happy with Swift's String performance? Or using Strings in places where a byte array could be used?

-- Johannes


> 
> 
> hh
> 
>> [1]: https://github.com/swift-server/http/tree/develop/Sources/HTTP
>> [2]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPHeaders.swift#L11
>> [3]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L28
>> [4]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L13
>> [5]: https://github.com/swift-server/http/blob/develop/Sources/HTTP/HTTPResponse.swift#L39
> 
> 
> _______________________________________________
> swift-server-dev mailing list
> swift-server-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-server-dev



More information about the swift-server-dev mailing list