[swift-evolution] Zero-copy String buffer access

Mon Nov 13 12:59:40 CST 2017

> On Nov 3, 2017, at 10:39 AM, Cory Benfield via swift-evolution <swift-evolution at swift.org> wrote:
> 
> One of Swift’s major advantages as a language is the ease of bridging from Swift code to C. This ease makes it possible to utilise the vast body of existing code to bootstrap projects, rather than reinventing the world in Swift every time we have a problem. 
> 
> The String type in Swift has some affordances for this use-case. The withCString method, the utf8CString property, and the cString(using:) functions are all very effective at providing the most-common case: a NULL-terminated string suitable for passing into most libc functions. However, using any of these affordances will always incur a memory copy, as Swift needs to not just ensure that the bytes making up the String are in contiguous memory, but also need to append a NULL byte to those strings for C safety.
> 

This is something we’re actively working on. It’s a stretch goal for 4.1, but certainly no promises. In full generality, it’s not always possible as we support bridged NSStrings with non-contiguous backing storage, but we should ensure all native Swift strings are always contiguous and nul terminated (hey, it’s just a byte or two). Then, we can discuss APIs to provide zero copy ways to get the pointer.

> This is a bit frustrating when working with C libraries that accept strings in the form of pointer + length, and so do not require NULL-termination, such as libicu. In these cases we are always required to incur the overhead of a memory copy, even in situations when the underlying String representation is contiguous, all in the name of appending a NULL byte we don’t actually need. Worse, the pointers provided by those methods are not BufferPointers, so they don’t carry their length around with them, requiring that another function call be used to determine the length of the pointer.

This is also something we’re actively working on. There’s a branch were we have an “UnsafeString” (name may change) which is just pointer, length, and some flags. This is useful for internal usage (implementing existing String APIs in a more performant fashion). Once that’s in, whether and how to surface this construct as API is a needed discussion for swift-evolution.

> 
> It would be convenient to have one or more additional functions that allow us to get access to a contiguous representation of bytes making up the string without appending a NULL byte, as a BufferPointer. The guarantees of these functions would be:
> 
> 1. If the underlying string is stored in contiguous memory; AND
> 2. It is stored in the encoding the user has requested; THEN
> 3. An UnsafeBufferPointer will be returned that points to the underlying storage, without NULL-termination; OTHERWISE
> 4. A new contiguous buffer will be allocated and the string will be copied into it, with no NULL-termination.
> 
> Of course, I’ve used the word “return” here, but in practice all of these functions would be best used as with* style functions that accept trailing non-escaping closures.
> 

Yup! At the very least, a ‘withUnsafeString’ should be a reasonably orthogonal API to propose (once the aforementioned infrastructure is in place).

> The advantage of these functions is that they avoid unnecessary copying of memory in circumstances when the internal String representation was already suitable for passing to the C library. In the case of libraries like libicu, this halves the number of memory accesses in common-cases (e.g. passing a UTF-8 string), which can provide substantial improvements to both performance and memory usage on hot code paths.
> 
> Does this seem like it’s of interest to anyone else?
> 
> Cory
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution