<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><p style="-webkit-print-color-adjust: exact; margin-right: 0px; margin-bottom: 15px; margin-left: 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255); margin-top: 0px !important;" class="">On Darwin, known-ASCII strings are sorted according to the lexicographical ordering of their code units. All non-known-ASCII strings are otherwise ordered based on the UCA[1]. On Linux, however, even known-ASCII strings are ordered based on UCA. I propose to unify these by changing Linux’s string sort order to match Darwin’s in Swift 4.0.</p><h4 id="toc_0" style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 16px; font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Background</h4><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">Swift’s default ordering for strings is appropriate for machine consumption (e.g. implementing sorted collections). It obeys Unicode canonical equivalence[2], that is strings compare the same modulo normalization. However, it is not meant to be sufficient for presenting a meaningful ordering to human consumers, as that requires incorporating reader-specific information (e.g. [3]). </p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">Known-ASCII strings are a trivial case for the described sort order semantics: pure ASCII is unaffected by normalization. Thus, lexicographical ordering of code units is a valid machine ordering for ASCII strings. On Darwin, this is used to order known-ASCII strings while Linux uses UCA even for known-ASCII strings.</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">Long term, the plan is to switch String’s sort order to be the lexicographical ordering of normalized code units (or perhaps scalar values), as mentioned in the String Manifesto[4]. This is a more efficient ordering than that provided by UCA. However, this will not make it in time for Swift 4.0. </p><h4 id="toc_1" style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 16px; font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Changes</h4><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">I propose to change Linux’s sort order for known-ASCII strings to be the same as it is on Darwin. This will be accomplished by dropping the relevant <code style="-webkit-print-color-adjust: exact; margin: 0px 2px; padding: 0px 5px; white-space: nowrap; border: 1px solid rgb(234, 234, 234); background-color: rgb(248, 248, 248); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">#if</code> guards in StringCompare.swift. An example implementation can be found at [5].</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">In addition to unifying sort order semantics across platforms, this will also deliver significant performance boosts to pure ASCII strings on Linux.</p><h2 id="toc_2" style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class=""></h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">[1] <a href="http://unicode.org/reports/tr10/" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">UTS #10: Unicode Collation Algorithm</a></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">[2] <a href="http://unicode.org/notes/tn5/" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">Canonical Equivalence in Applications</a></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">[3] <a href="http://unicode.org/reports/tr10/#Contextual_Sensitivity" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">UCA: Contextual Sensitivity</a></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">[4] <a href="https://github.com/apple/swift/blob/master/docs/StringManifesto.md#comparing-and-hashing-strings" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">String Manifesto: Comparing and Hashing Strings</a></p><p style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-right: 0px; margin-left: 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255); margin-bottom: 0px !important;" class="">[5] <a href="https://github.com/milseman/swift/commit/5560e13198d5cc284f46bf190f59a2edf7ed747b" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">Unifying Linux/Darwin ASCII sort order semantics - github</a></p></body></html>