[swift-evolution] Faster/lower-level external String initialization

Zach Waldowski zach at waldowski.me
Fri Jan 8 14:21:43 CST 2016


Going back and forth from Strings to their byte representations is an
important part of solving many problems, including object
serialization, binary file formats, wire/network interfaces, and
cryptography.

In developing such a parser, a coworker did the yeoman's work of
benchmarking
Swift's Unicode types. He swore up and down that
String.Type.fromCString(_:) [0]
was the fastest way he found. I, stubborn and noobish as I am, was
skeptical
that a better way couldn't be wrought from Swift's UnicodeCodecTypes.

After reading through stdlib source and doing my own testing, this is no
wives'
tale. fromCString [1] is essentially the only public user of
String.Type._fromCodeUnitSequence(_:input:), which serves the exact role
of
both efficient and safe initialization-by-buffer-copy.

Of course, fromCString isn't a silver bullet; it has to have a null
sentinel,
requiring a copy of the origin buffer if one needs to be added (as is
the
case with formats that specify the length up front, or unstructured
payloads
that use unescaped double quotes as the terminator). It also prevents
the string
itself from containing the null character.

I'd like to see _fromCodeUnitSequence [2] become public API as (just
spittballing here) String.init?<Collection, Codec>(codeUnits:encoding:).
If that
can't happen, an alternative to fromCString that doesn't use strlen
would be
nice, and we can just eat the performance hit on other code unit
sequences.

I can't really think of a reason why it's not exposed yet, so I'm led to
believe
I'm just missing something major, and not that a reason doesn't exist.
;-)

There's also discussion to be had of if API is needed. Try as I might, I
can't seem to get the reserveCapacity/append(UnicodeScalar) workflow to
have
anything close to the same speed. [3] Profiling indicates that I keep
hitting
_StringBuffer.grow. I don't know if that means the buffer isn't uniquely
referenced, or it's a bug, or what, but it's consistently slower than
creating
an Array of the bytes and performing fromCString on it. Similar story
with
crossing the NSString bridge, which is even stranger. [4]

Anyway, I wanted to stir up discussion, see if I'm way off base and/or
whether
this can be turned into a proposal.

[0]:
https://gist.github.com/zwaldowski/5f1a1011ea368e1c833e#file-fromcstring-swift
[1]:
https://github.com/apple/swift/blob/master/stdlib/public/core/CString.swift#L18-L31
[2]:
https://github.com/apple/swift/blob/master/stdlib/public/core/String.swift#L134-L150
[3]:
https://gist.github.com/zwaldowski/5f1a1011ea368e1c833e#file-unicodescalar-swift
[4]:
https://gist.github.com/zwaldowski/5f1a1011ea368e1c833e#file-nsstring-swift

Cheers,
Zachary Waldowski
zach at waldowski.me


More information about the swift-evolution mailing list