[swift-dev] State of String: ABI & Performance

Thu Jan 11 23:46:12 CST 2018

On Jan 11, 2018, at 12:32 PM, Michael Ilseman <milseman at apple.com> wrote:
>>> We may also want a compact 5-bit encoding for formatted numbers, such as 64-bit memory addresses in hex, `Int.max` in base-10, and `Double` in base-10, which would require 18, 19, and 24 characters respectively. 120 bits with a 5-bit encoding would fit all of these. This would speed up the creation and interpolation of many strings containing numbers.
>> 
>> I think it is important to consider that having more special cases and different representations slows down nearly *every* operation on string because they have to check and detangle all of the possible representations.
> 
> nit: Multiple small representations would slow down nearly every operation on *small* strings. That is, we would initially branch on a isSmall check, and then small strings would pay the cost of further inspection. This could also slow down String construction, depending on how aggressively we try to pack/transcode and whether we also have specialized inits.

Right.  It is actually a bit more complicated than that though.  Consider the Swift 4 generation of strings and arrays: because of lazy bridging each primitive operation includes the dynamic branching sequence.  Beyond the dynamic performance impact of this, we end up having a really unfortunate set of tradeoffs to choose between:

1. We can leave the operations outlined in the standard library.  Particularly for Array which benefits from being specialized on its element type, this is a huge perf loss.

2. We can inline the operation, but we end up inlining both sides of the operation, causing massive code bloat.

3. We can partially inline just the presumed fastpath, and leave the slow path out of line.

The problem with this when you bring it to string is that the “fastpath” would naturally be the small string case.  If you make *it* have many levels of branching underneath it, not only do you slow down the fast case, but you also cause intense code bloat if you want to inline core operations into user code.

>>  Given the ability to hold 15 digits of ascii, I don’t see why it would be worthwhile to worry about a 5-bit representation for digits.  String should not be an Any!

I still don’t understand why this would even be considered, do you have any more rationale to explain that?

>>> Final details are still in exploration. If the construction and interpretation of small bit patterns can remain behind a resilience barrier, new forms could be added in the future. However, the performance impact of this would need to be carefully evaluated.
>> 
>> I’d love to see performance numbers on this.  Have you considered a design where you have exactly one small string representation: a sequence of 15 UTF8 bytes?  This holds your 15 bytes of ascii, probably more non-ascii characters on average than 7 UTF16 codepoints, and only needs one determinator branch on the entry point of hot functions that want to touch the bytes.  If you have a lot of algorithms that are sped up by knowing they have ascii, you could go with two bits:  “is small” and “isascii”, where isascii is set in both the small string and normal string cases.
>> 
> 
> We have not yet done the investigation, so all details could change. This is my (perhaps flawed!) reasoning as to why, in general, it may be useful to have multiple small forms. Again, this will change as we learn more.
> 
> Strings are created and copied (i.e. a value-copy/retain) more often than they are read (and strings are read far more often than they are modified).

If you believe it is highly biased towards this, then this strongly argues for a single word representation...

> The “high-order bit” for performance is whether a string is small-or-not, as avoiding an allocation and, just as importantly, managing the memory with ARC is skipped. Beyond that, we’re debating smaller performance deltas, which are still important. This reasoning also influences where we choose to stick our resilience barriers, which can be relaxed in the future.

You’re making an argument about aggregate use cases on average, but it is unwise to ignore the common performance critical cases that do actually want to look at the bytes. Consider the http header parsing challenges the swift on server workgroup faced that led them to use a C implementation, for example.

> One of the nice things about having a small representation for each of our backing storage kinds (ASCII/UTF-8 vs UTF-16) is that it would gives us a very fast way of conservatively checking whether we form a small string. E.g., if we’re forming a Substring over a portion of a UTF-16-backed string, we can check whether the range holds 7 or fewer code units to avoid the ARC. I don’t know if it would be worth the attempt to detect whether it would fit in 15 transcoded UTF-8 code units vs bumping the ref count. Avoiding the extra packing attempt may help branch mis-predicts, though I’m *way* out of my depth when it comes to reasoning about mis-predicts as they emerge in the wild. There is a “temporal locality of Unicode-ness” in string processing, though.
> 
> As far as what strings a 7 code unit UTF-16 small form that couldn’t fit in 15 code units of UTF-8, the biggest cases are latter-BMP scalars where a small UTF-8 string could only store 5. This may not be a big deal.

Yes, I doubt that is a big deal.  The increased density for international strings (at the cost of unpacking if you really do want UTF16 for some thing) seem clearly beneficial: you eliminate the arc overhead and allocation, which quickly dwarf any unpacking overhead that might happen.

> As to a 5-bit encoding, again, all details pending more experimentation. Its importance may be diminished by more efficient, low-level String construction interfaces. The difference between 15 and 24 characters could be big, considering that formatted addresses and Doubles fit in that range. We’ll see.

I still can’t imagine any workload that would care about this, and cannot see why punishing the normal cases would be worth any cost.  Can you?

>> Finally, what tradeoffs do you see between a 1-word vs 2-word string?  Are we really destined to have 2-words?  That’s still much better than the 3 words we have now, but for some workloads it is a significant bloat.
> 
> <repeat disclaimer about final details being down to real data>. Some arguments in favor of 2-word, presented roughly in order of impact:

Understood.  I don’t have a strong opinion on 1 vs 2 words, either are dramatically better than 3 :-).  I’m glad you’re carefully evaluating the tradeoff.

> 1. This allows the String type to accommodate llvm::StringRef-style usages. This is pretty broad usage: “mmap a file and treat its contents as a String”, “store all my contents in an llvm::BumpPtr which outlives uses”, un-owned slices, etc. One word String would greatly limit this to only whole-string nul-terminated cases.

Yes, StringRef style algorithms are a big deal, as I mentioned in my previous email, but it is also unclear if this will really be a win when shoehorned into String.  The benefit of StringRef is that it is a completely trivial type (both in the SIL sense but also in the implementation sense) and all the primitive ops get inlined.  Given the “all things to all people” design of String, I’m very much afraid that trying to shoehorn this into the String currency type will fail to provide significant wins and thus lead to having a separate StringRef style type anyway.  Providing a StringRef style projection type that is trivial (in the Swift sense) that knows in its static type that it never owns memory seems like the best path.

By point of comparison, C++ has std::string (yes, sure, with lots of issues) but they still introduced StringRef nee std::string_view instead of wedging it in.

> 2. Two-word String fits more small strings. Exactly where along the diminishing-returns curve 7 vs 15 UTF-8 code units lie is dependent on the data set. One example is NSString, which (according to reasoning at https://www.mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html <https://www.mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html>) considered it important enough to have 6- and 5- bit reduced ASCII character sets to squeeze up to 11-length strings in a word. 15 code unit small strings would be a super-set of tagged NSStrings, meaning we could bridge them eagerly in-line, while 7 code unit small strings would be a subset (and also a strong argument against eagerly bridging them). 

Agreed, this is a big deal.

> If you have access to any interesting data sets and can report back some statistics, that would be immensely helpful!

Sadly, I don’t. I’m only an opinionated hobbyist in this domain, one who has coded a lot of string processing over the years and understands at least some of the tradeoffs.

> 3. More bits available to reserve for future-proofing, etc., though many of these could be stored in the header.
> 
> 4. The second word can cache useful information from large strings. `endIndex` is a very frequently requested computed property and it could be stored directly in-line rather than loaded from memory (though perhaps a load happens anyways in a subsequent read of the string). Alternatively, we could store the grapheme count or some other piece of information that we’d otherwise have to recompute. More experimentation needed here.

This seems weakly motivated: large strings can store end index in the heap allocation.

> 5. (vague and hand-wavy) Two-words fits into a nice groove that 3-words doesn’t: 2 words is a rule-of-thumb size for very small buffers. It’s a common heap alignment, stack alignment, vector-width, double-word-load width, etc.. 1-word Strings may be under-utilizing available resources, that is the second word will often be there for use anyways. The main case where this is not true and 1-word shines is aggregates of String.

What is the expected existential inline buffer size going to wind up being?  We sized it to 3 words specifically to fit string and array.  It would be great to shrink that to 2 or 1 words.

>>> ### Unmanaged Strings
>>> 
>>> Such unmanaged strings can be used when the lifetime of the storage is known to outlive all uses. 
>> 
>> Just like StringRef!  +1, this concept can be a huge performance win… but can also be a huge source of UB if used wrong.. :-(
>> 
> 
> When it comes to naming, I’m more prone to argue for “UnsafeString” rather than “UnmanagedString”, but that’s a later debate for SE ;-)

+1

>>> Some kind of unmanaged string construct is an often-requested feature for highly performance-sensitive code. We haven’t thought too deeply about how best to surface this as API and it can be sliced and diced many ways. Some considerations:
>> 
>> Other questions/considerations:
>> - here and now, could we get the vast majority of this benefit by having the ABI pass string arguments as +0 and guarantee their lifetime across calls?  What is the tradeoff of that?
> 
> Michael Gottesman (+CC) is doing this sort of investigation right now, evaluating the tradeoffs of more wide-spread use of the +0 convention. My current understanding is that even with +0, many retain/release pairs cannot be eliminated without more knowledge of the callee, but I don’t know the reasons why. Hopefully he can elaborate.

The tradeoff seems to be that you’re extending the lifetime across a call, preventing it from being released earlier, and if you can’t get callee side guaranteed ownership (like the borrow checker provides) then you have to retain/release anyway.

>> - does this concept even matter if/when we can write an argument as “borrowed String”?  I suppose this is a bit more general in that it should be usable as properties and other things that the (initial) ownership design isn’t scoped to support since it doesn’t have lifetimes.
> 
> Right, it’s a little more general.
> 
>> - array has exactly the same sort of concern, why is this a string-specific problem?
> 
> I think it’s interesting as slices may emerge more often in practice on strings than arrays, e.g. for presenting parsing results. Also, applications that care about this level of performance may custom-allocate all their string data contiguously (e.g. llvm::BumpPtr). I believe that projects such as llbuild have done this, forcing themselves into an API schism between String and UnsafeBufferPointer<UInt8>, often duplicating functionality. In theory all of this could also apply to array, but it seems to happen more for strings.

I see String and Array as directly analogous in the space.  The *only* difference IMO is when you get to algorithms that interpret the bytes because of frickin’ unicode.

>> - how does this work with mutation?  Wouldn’t we really have to have something like MutableArrayRef since its talking about the mutability of the elements, not something that var/let can support?
>> 
> 
> Good point. If this is more of a “UnsafeString”-like concept, there’s analogy with Unsafe[Mutable]BufferPointer.
> 
>> When I think about this, it seems like an “non-owning *slice*” of some sort.  If you went that direction, then you wouldn’t have to build it into the String currency type, just like it isn’t in LLVM.
>> 
> 
> Yes, though you would lose the ability to give them to APIs expecting a String when you know it’s safe to do so. E.g., when the contents are effectively immortal relative to the API call and effects. Careful use of “unsafe” or “unmanaged” in type names and argument labels needed here.

Yes, I think that your point about Unsafe is important, which is even more reason not to wedge it into String.  I really do understand the tradeoff of how you can’t pass a “Ref” to something that takes a String, but we already extensively debated that topic and accepted it w.r.t. Slice types.  This seems directly analogous.

>>> * Unification of String and Substring’s various Views.
>>> * Some form of opaque string, such as an existential, which can be used as an extension point.
>>> * More compact/performant indices.
>> 
>> What is the current theory on eager vs lazy bridging with Objective-C?  Can we get to an eager bridging model?  That alone would dramatically simplify and speed up every operation on a Swift string.
>> 
> 
> I don’t think that will be feasible in general, although perhaps it could happen for most common occurrences.

Bridging is either “always eager” or “possibly lazy”, there is nothing usefully in between.  Lazy bridging means that every operations down to the lowly ascii parsing operations need to check for the slow path.  I understand that this may not be possible to eliminate on apple platforms, but it would be great to not add the existential generalization since that would affect other platforms that don’t have the legacy api concern.

>> 
>>   func hexToInt(_ c : UInt8) -> UInt8 {
>>     switch c {
>>     case charToByte("0")...charToByte("9"): return c - charToByte("0")
>>     case charToByte("a")...charToByte("f"): return c - charToByte("a") + 10
>>     case charToByte("A")...charToByte("F"): return c - charToByte("A") + 10
>>     default: fatalError("invalid hexadecimal character")
>>     }
>>   }
>> 
> 
> Yeah, trying to do ASCII value ranges is currently pretty rough. Here’s what I did recently when I wanted something similar, using unicode scalar literal ranges:

We resisted adding single quoted strings for characters early on because we felt that it was a marginal case and might want to use them for additional quoting characters (like scripting languages like Perl that allow either single or double quotes) or for multi-line strings.  However, now that multiline strings are settled and now that it is clear that we’re not going to add multiple redundant ways to quote strings, I think it is worth considering whether we should introduce ExpressibleByCharacterLiteral types and whether UInt8 should be one (along with Character of course) and tie it to single quotes.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20180111/0c78e7cf/attachment-0001.html>