[swift-evolution] Faster/lower-level external String initialization

Charles Kissinger crk at akkyra.com
Fri Jan 8 16:51:42 CST 2016


> I'd like to see _fromCodeUnitSequence [2] become public API 

I am very much in favor of this. I have had *exactly* the same experience.

String.reserveCapacity() seems to act like a no-op for some reason so append() is incredibly slow, and fromCString() often necessitates a copy to an intermediate buffer because of the the null-byte requirement.

This has been one of the weakest areas of Swift performance for me.

-CK

> On Jan 8, 2016, at 12:21 PM, Zach Waldowski via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Going back and forth from Strings to their byte representations is an
> important part of solving many problems, including object
> serialization, binary file formats, wire/network interfaces, and
> cryptography.
> 
> In developing such a parser, a coworker did the yeoman's work of
> benchmarking
> Swift's Unicode types. He swore up and down that
> String.Type.fromCString(_:) [0]
> was the fastest way he found. I, stubborn and noobish as I am, was
> skeptical
> that a better way couldn't be wrought from Swift's UnicodeCodecTypes.
> 
> After reading through stdlib source and doing my own testing, this is no
> wives'
> tale. fromCString [1] is essentially the only public user of
> String.Type._fromCodeUnitSequence(_:input:), which serves the exact role
> of
> both efficient and safe initialization-by-buffer-copy.
> 
> Of course, fromCString isn't a silver bullet; it has to have a null
> sentinel,
> requiring a copy of the origin buffer if one needs to be added (as is
> the
> case with formats that specify the length up front, or unstructured
> payloads
> that use unescaped double quotes as the terminator). It also prevents
> the string
> itself from containing the null character.
> 
> I'd like to see _fromCodeUnitSequence [2] become public API as (just
> spittballing here) String.init?<Collection, Codec>(codeUnits:encoding:).
> If that
> can't happen, an alternative to fromCString that doesn't use strlen
> would be
> nice, and we can just eat the performance hit on other code unit
> sequences.
> 
> I can't really think of a reason why it's not exposed yet, so I'm led to
> believe
> I'm just missing something major, and not that a reason doesn't exist.
> ;-)
> 
> There's also discussion to be had of if API is needed. Try as I might, I
> can't seem to get the reserveCapacity/append(UnicodeScalar) workflow to
> have
> anything close to the same speed. [3] Profiling indicates that I keep
> hitting
> _StringBuffer.grow. I don't know if that means the buffer isn't uniquely
> referenced, or it's a bug, or what, but it's consistently slower than
> creating
> an Array of the bytes and performing fromCString on it. Similar story
> with
> crossing the NSString bridge, which is even stranger. [4]
> 
> Anyway, I wanted to stir up discussion, see if I'm way off base and/or
> whether
> this can be turned into a proposal.
> 
> [0]:
> https://gist.github.com/zwaldowski/5f1a1011ea368e1c833e#file-fromcstring-swift
> [1]:
> https://github.com/apple/swift/blob/master/stdlib/public/core/CString.swift#L18-L31
> [2]:
> https://github.com/apple/swift/blob/master/stdlib/public/core/String.swift#L134-L150
> [3]:
> https://gist.github.com/zwaldowski/5f1a1011ea368e1c833e#file-unicodescalar-swift
> [4]:
> https://gist.github.com/zwaldowski/5f1a1011ea368e1c833e#file-nsstring-swift
> 
> Cheers,
> Zachary Waldowski
> zach at waldowski.me
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution



More information about the swift-evolution mailing list