[swift-dev] Quick pitch: Change Linux’s string comparison to match Darwin’s

David Jones djones6dev at gmail.com
Mon Jul 31 10:58:17 CDT 2017


Kitura does not mandate any particular distribution, but 14.04 is one of
the distributions that we expect to support for some time.

I did investigate what's required to build a Swift snapshot with a newer
version of ICU on Linux.  I was pleased to discover that of the support for
doing so is already there, however the Foundation build needs tweaking to
point to the newly built version.  I opened a couple of PRs to fix it, but
then someone pointed out an existing (unmerged) PR that fixes it in a
simpler way:
https://github.com/apple/swift-corelibs-foundation/pull/722

With #722, I can build a new snapshot on Ubuntu 14.04 or 16.04 using ICU
57.1 from http://site.icu-project.org/download/57#TOC-ICU4C-Download
I'm using the buildbot_linux preset, so I added libicu=true and
install-libicu to the preset in utils/build-presets.ini

With this build, Kitura's 14.04 performance is much closer to the memcmp
version (leaving around 8% gap, similar to 16.04, which is what I'd expect):

Language (Runtime)        Framework (Middleware)          Max [sec]
Min [sec]       Ave [sec]
------------------------- ------------------------- ---------------
--------------- ---------------
swift                     kitura_40                        8.131616
7.634630        7.959684
swift                     kitura_40_memcmp                 5.072511
4.784608        4.963862
swift                     kitura_40_icu                    5.523411
5.297158        5.367100

On 31 July 2017 at 16:57, David Jones <djones6dev at gmail.com> wrote:

> Kitura does not mandate any particular distribution, but 14.04 is one of
> the distributions that we expect to support for some time.
>
> I did investigate what's required to build a Swift snapshot with a newer
> version of ICU on Linux.  I was pleased to discover that of the support for
> doing so is already there, however the Foundation build needs tweaking to
> point to the newly built version.  I opened a couple of PRs to fix it, but
> then someone pointed out an existing (unmerged) PR that fixes it in a
> simpler way:
> https://github.com/apple/swift-corelibs-foundation/pull/722
>
> With #722, I can build a new snapshot on Ubuntu 14.04 or 16.04 using ICU
> 57.1 from http://site.icu-project.org/download/57#TOC-ICU4C-Download
> I'm using the buildbot_linux preset, so I added libicu=true and
> install-libicu to the preset in utils/build-presets.ini
>
> With this build, Kitura's 14.04 performance is much closer to the memcmp
> version (leaving around 8% gap, similar to 16.04, which is what I'd expect):
>
> Language (Runtime)        Framework (Middleware)          Max [sec]
> Min [sec]       Ave [sec]
> ------------------------- ------------------------- ---------------
> --------------- ---------------
> swift                     kitura_40
> 8.131616        7.634630        7.959684
> swift                     kitura_40_memcmp
> 5.072511        4.784608        4.963862
> swift                     kitura_40_icu
> 5.523411        5.297158        5.367100
>
>
> On 27 July 2017 at 18:48, Michael Ilseman <milseman at apple.com> wrote:
>
>> The performance issues in Ubuntu 14.04 is not surprising. 14.04’s ICU
>> version is ancient, relatively speaking (e.g. it predates both Swift and
>> Unicode 7.0). Performance issues is not the only problem there, as Unicode
>> semantics such as grapheme breaking has evolved dramatically since then.
>> Out of curiosity, is there a reason why Kitura is stuck on 14.04? Can your
>> platform bundle a newer ICU so that your users can have things like modern
>> emoji support?
>>
>>
>>
>> On Jul 27, 2017, at 7:28 AM, David Jones via swift-dev <
>> swift-dev at swift.org> wrote:
>>
>> FWIW, here is an illustration of how much an optimization in this area
>> would be worth to Kitura, running on Ubuntu 14.04 (where ICU 52 is
>> particularly expensive when comparing ASCII strings [1]).
>> The workload I used here is https://github.com/tbrand/whic
>> h_is_the_fastest
>> I have compiled Kitura 1.7.6 with the latest 4.0 toolchain (07-26-a), and
>> again with the same toolchain modified to remove the #if in StringCompare's
>> ==.
>>
>> djones6 at needletail:~/which_is_the_fastest$ numactl
>> --physcpubind=0-3,16-19 --membind=0 bin/benchmarker kitura_40
>> kitura_40_memcmp
>> Language (Runtime)        Framework (Middleware)          Max [sec]
>> Min [sec]       Ave [sec]
>> ------------------------- ------------------------- ---------------
>> --------------- ---------------
>> swift                     kitura_40
>> 7.824673        7.682657        7.740933
>> swift                     kitura_40_memcmp
>> 5.163788        4.811082        4.955571
>>
>> The difference here is around 35%. Other Kitura workloads I've performed
>> this comparison on in the past (such as involving JSON serialization) have
>> showed a difference in the 15 - 20% region.
>> The difference is far smaller on Ubuntu 16.04 (around 8% for this
>> workload), due to improvements in the newer level of ICU:
>>
>> djones6 at gruffalo:~/which_is_the_fastest$ numactl --physcpubind=0-3,16-19
>> --membind=0 bin/benchmarker kitura_40 kitura_40_memcmp
>> Language (Runtime)        Framework (Middleware)          Max [sec]
>> Min [sec]       Ave [sec]
>> ------------------------- ------------------------- ---------------
>> --------------- ---------------
>> swift                     kitura_40
>> 4.691993        4.531465        4.580086
>> swift                     kitura_40_memcmp
>> 4.349387        4.015061        4.201105
>>
>> - David.
>> ---
>> David Jones, Swift at IBM
>>
>> [1] https://github.com/apple/swift/pull/7339
>>
>> On 26 July 2017 at 03:59, Michael Ilseman via swift-dev <
>> swift-dev at swift.org> wrote:
>>
>>> Unfortunately after some investigations and discussion, the situation
>>> seems to be more murky. This approach would break transitivity of String
>>> comparison on Linux, at least with any implementation of UCA using the
>>> normal collation weights. A < B, B < C should imply A < C. But, if both A
>>> and B are known-ASCII while C is UTF16, transitivity can be broken for any
>>> character that UCA yields a different sort order for (e.g. “#” vs “&”). On
>>> Darwin, the comparison implementation happens to preserve transitivity as
>>> the platform (in effect) relatively weights ASCII by code unit values.
>>>
>>> While I would like to get some performance improvements in time for
>>> Linux, I don’t think this approach is viable for Swift 4.0. Unless anyone
>>> has any ideas about another minimally invasive approach, my recommendation
>>> is to do the long-term solution (lexicographical order of normalized code
>>> units) immediately after Swift 4.0.
>>>
>>>
>>> On Jul 25, 2017, at 2:01 PM, Michael Ilseman via swift-dev <
>>> swift-dev at swift.org> wrote:
>>>
>>> On Darwin, known-ASCII strings are sorted according to the
>>> lexicographical ordering of their code units. All non-known-ASCII strings
>>> are otherwise ordered based on the UCA[1]. On Linux, however, even
>>> known-ASCII strings are ordered based on UCA. I propose to unify these by
>>> changing Linux’s string sort order to match Darwin’s in Swift 4.0.
>>> Background
>>>
>>> Swift’s default ordering for strings is appropriate for machine
>>> consumption (e.g. implementing sorted collections). It obeys Unicode
>>> canonical equivalence[2], that is strings compare the same modulo
>>> normalization. However, it is not meant to be sufficient for presenting a
>>> meaningful ordering to human consumers, as that requires incorporating
>>> reader-specific information (e.g. [3]).
>>>
>>> Known-ASCII strings are a trivial case for the described sort order
>>> semantics: pure ASCII is unaffected by normalization. Thus, lexicographical
>>> ordering of code units is a valid machine ordering for ASCII strings. On
>>> Darwin, this is used to order known-ASCII strings while Linux uses UCA even
>>> for known-ASCII strings.
>>>
>>> Long term, the plan is to switch String’s sort order to be the
>>> lexicographical ordering of normalized code units (or perhaps scalar
>>> values), as mentioned in the String Manifesto[4]. This is a more efficient
>>> ordering than that provided by UCA. However, this will not make it in time
>>> for Swift 4.0.
>>> Changes
>>>
>>> I propose to change Linux’s sort order for known-ASCII strings to be the
>>> same as it is on Darwin. This will be accomplished by dropping the relevant
>>> #if guards in StringCompare.swift. An example implementation can be
>>> found at [5].
>>>
>>> In addition to unifying sort order semantics across platforms, this will
>>> also deliver significant performance boosts to pure ASCII strings on Linux.
>>>
>>> [1] UTS #10: Unicode Collation Algorithm
>>> <http://unicode.org/reports/tr10/>
>>>
>>> [2] Canonical Equivalence in Applications
>>> <http://unicode.org/notes/tn5/>
>>>
>>> [3] UCA: Contextual Sensitivity
>>> <http://unicode.org/reports/tr10/#Contextual_Sensitivity>
>>>
>>> [4] String Manifesto: Comparing and Hashing Strings
>>> <https://github.com/apple/swift/blob/master/docs/StringManifesto.md#comparing-and-hashing-strings>
>>>
>>> [5] Unifying Linux/Darwin ASCII sort order semantics - github
>>> <https://github.com/milseman/swift/commit/5560e13198d5cc284f46bf190f59a2edf7ed747b>
>>> _______________________________________________
>>> swift-dev mailing list
>>> swift-dev at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> swift-dev mailing list
>>> swift-dev at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>>
>>>
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-dev
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170731/45ffd6d2/attachment.html>


More information about the swift-dev mailing list