[swift-users] Optimal String conversion to/from UTF-8 w/o null-termination?
Jens Alfke
jens at mooseyard.com
Fri Nov 3 14:41:53 CDT 2017
I’m working with a C API that represents strings as UTF-8 data tagged with a length but **without a trailing NUL byte**. In other words, its string type is basically a tuple {const char*, size_t}. I need to convert this representation to and from Swift 4 strings.
This needs to be efficient, as these calls will occur in some areas of my project that are known to be performance-critical. (Equivalent conversions in my Obj-C code have already shown up as hot-spots and been carefully optimized.)
For String-to-UTF-8, I’m using String.withCString():
_ = str.withCString { bytes in c_function(bytes, strlen(bytes)) }
An alternative is
let bytes = [UInt8](str.utf8)
c_function(&bytes, bytes.count)
Any idea which of these is more optimal? The former has to call strlen, but I suspect the latter may incur more heap allocation.
For UTF-8-to-String I use this, where `stringPointer` is an UnsafeRawPointer and stringLen is an Int:
let data = Data(bytes: stringPointer, count: stringLen)
return String(data: data, encoding: String.Encoding.utf8)
I’m unhappy about this because it incurs both heap allocation and copying the string bytes. But Data doesn’t seem to have the “noCopy” options that NSData does. Any way to pass the bytes directly to String without an intermediate copy?
—Jens
PS: I’m aware this is an FAQ, but I’ve already put in time searching. Most of the hits are obsolete because the damn String API keeps changing, or else they assume NUL-terminated C strings; and the remainder don’t consider performance.
More information about the swift-users
mailing list