[swift-dev] String comparison improvements

Michael Ilseman milseman at apple.com
Wed Jan 17 17:36:15 CST 2018



> On Jan 17, 2018, at 2:36 PM, Itai Ferber <iferber at apple.com> wrote:
> 
> Hi Lance (and Michael — I’ll keep the conversation in one thread),
> 
> Sounds reasonable. My one concern is that behavior around implicitly bridged strings is going to change, in a potentially problematic and "magic" way:
> 
> let s1: String = …
> let s2: String = …
> 
> let mySet = NSMutableSet()
> mySet.add(s1)
> mySet.add(s2)
> 
> // s1 != s2 by Swift ordering rules, but CFStringCompare(s1, s2, …) == kCFCompareEqualTo
> print(mySet.count) // => 1 ????
> or alternatively:
> 
> let s1: String = …
> let s2: String = …
> 
> let mySet = NSMutableSet()
> mySet.add(s1)
> 
> // s1 == s2 by Swift ordering rules, but CFStringCompare(s1, s2, …) != kCFCompareEqualTo
> print(mySet.contains(s2)) // => false ????
> I don’t think there’s much we can do about this, but this is something we’re going to have to watch out for — this will be a relatively rare and very subtle behavior change.
> 

My understanding is that CFStringCompare, when given kCFCompareNonliteral, follows canonical equivalence and thus equality would not be affected. However, ordering, i.e. `<`, could change.

> — Itai
> 
> On 17 Jan 2018, at 13:56, Lance Parker wrote:
> 
> Comments inline below
> 
>> On Jan 17, 2018, at 1:46 PM, Itai Ferber <iferber at apple.com <mailto:iferber at apple.com>> wrote:
>> 
>> Hi Lance,
>> 
>> I read Michael’s emails but I don’t remember at the moment — what is the new string comparison implementation going to be based on?
>> 
> The new approach uses the lexicographical ordering of NFC-normalized UTF-16 code units. For two known ASCII strings, we just use memcmp.
>> Also, how will this affect bridged strings? If I compare two NSStrings, I may get a different result than if I compare the same two strings as bridged through String, correct?
>> 
> If I understand correctly, you’re asking what will happen if you have two strings explicitly typed as NSString in swift and you compare them. I believe they’ll still use whatever NSString does for comparison today, so CFStringCompare. For Swift strings backed by a bridged NSString, this new comparison method will be used. 
> 
> It might make sense for explicit NSStrings in Swift to use the new method as well. What do you think?
>> — Itai
>> 
>> On 17 Jan 2018, at 13:19, Lance Parker via swift-dev wrote:
>> 
>> Hey Swift-Dev,
>> 
>> The swift standard library team  have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:
>> 
>> The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation. 
>> We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups. 
>> 
>> Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs. 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20180117/b6b6b86c/attachment.html>


More information about the swift-dev mailing list