[swift-evolution] Zero-copy String buffer access
cbenfield at apple.com
Fri Nov 3 12:39:38 CDT 2017
One of Swift’s major advantages as a language is the ease of bridging from Swift code to C. This ease makes it possible to utilise the vast body of existing code to bootstrap projects, rather than reinventing the world in Swift every time we have a problem.
The String type in Swift has some affordances for this use-case. The withCString method, the utf8CString property, and the cString(using:) functions are all very effective at providing the most-common case: a NULL-terminated string suitable for passing into most libc functions. However, using any of these affordances will always incur a memory copy, as Swift needs to not just ensure that the bytes making up the String are in contiguous memory, but also need to append a NULL byte to those strings for C safety.
This is a bit frustrating when working with C libraries that accept strings in the form of pointer + length, and so do not require NULL-termination, such as libicu. In these cases we are always required to incur the overhead of a memory copy, even in situations when the underlying String representation is contiguous, all in the name of appending a NULL byte we don’t actually need. Worse, the pointers provided by those methods are not BufferPointers, so they don’t carry their length around with them, requiring that another function call be used to determine the length of the pointer.
It would be convenient to have one or more additional functions that allow us to get access to a contiguous representation of bytes making up the string without appending a NULL byte, as a BufferPointer. The guarantees of these functions would be:
1. If the underlying string is stored in contiguous memory; AND
2. It is stored in the encoding the user has requested; THEN
3. An UnsafeBufferPointer will be returned that points to the underlying storage, without NULL-termination; OTHERWISE
4. A new contiguous buffer will be allocated and the string will be copied into it, with no NULL-termination.
Of course, I’ve used the word “return” here, but in practice all of these functions would be best used as with* style functions that accept trailing non-escaping closures.
The advantage of these functions is that they avoid unnecessary copying of memory in circumstances when the internal String representation was already suitable for passing to the C library. In the case of libraries like libicu, this halves the number of memory accesses in common-cases (e.g. passing a UTF-8 string), which can provide substantial improvements to both performance and memory usage on hot code paths.
Does this seem like it’s of interest to anyone else?
More information about the swift-evolution