<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On Sep 22, 2016, at 6:11 PM, Xiaodi Wu <<a href="mailto:xiaodi.wu@gmail.com" class="">xiaodi.wu@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">On Thu, Sep 22, 2016 at 7:44 PM, Michael Gottesman<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:mgottesman@apple.com" target="_blank" class="">mgottesman@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><br class=""><div class=""><span class="gmail-"><blockquote type="cite" class=""><div class="">On Sep 22, 2016, at 5:09 PM, Xiaodi Wu <<a href="mailto:xiaodi.wu@gmail.com" target="_blank" class="">xiaodi.wu@gmail.com</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class="">On Thu, Sep 22, 2016 at 6:54 PM, Michael Gottesman<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:mgottesman@apple.com" target="_blank" class="">mgottesman@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><br class=""><div class=""><span class=""><blockquote type="cite" class=""><div class="">On Sep 22, 2016, at 4:19 PM, Xiaodi Wu <<a href="mailto:xiaodi.wu@gmail.com" target="_blank" class="">xiaodi.wu@gmail.com</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class="">You mean values of type String?</div></div></blockquote><div class=""><br class=""></div></span><div class="">I was speaking solely of constant strings.</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class="">I would want those to be exactly what I say they are; NFC normalization is available, if I recall, as part of Foundation, but by no means should my String values be silently changed!</div></div></blockquote><div class=""><br class=""></div></span><div class="">Why.</div></div></div></blockquote><div class=""><br class=""></div><div class="">For one, I don't want to pay the computational cost of normalization at runtime unless necessary.</div></div></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">This would only happen with strings that are known to be constant at compile time (and as such the transformation would occur at compile time). There would be no runtime cost.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Yes, for constant strings only there would be no runtime cost.</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><span class="gmail-"><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">For another, I expect to be able to round-trip user input.<span class="Apple-converted-space"> </span></div></div></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">String checks for canonical equivalence, IIRC.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Sure, but I'm not talking about using comparison operators here. I mean that if we have `let str = "[some non-NFC string]"`, I should be able to write that out to a file with all the non-canonical glyphs intact.</div></div></div></div></div></blockquote><div><br class=""></div><div>I would argue that most people that is not an interesting distinction. Naturally there would be a way to escape such canonicalization to get the non-canonicalized String.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class="">There are known issues with NFC that are acceptable for normalizing Swift identifiers but make it unsuitable for general use. For example, the normalized form of Greek ano teleia is middle dot, but these two glyphs are rendered differently in many fonts, and substituting a middle dot in place of the Greek punctuation mark is actually quite inadequate for Greek text (ano teleia is supposed to be around x-height; middle dot is not). Even for constant strings, it is essential that one can output ano teleia when it is specified rather than middle dot. However, Unicode normalization algorithms guarantee stability and will forever require swapping the former for the latter. I understand that other such problematic characters exist.</div></div></div></div></div></blockquote><div><br class=""></div><div>I would argue that that is a problem with the unicode standard and with the fonts. This is not a problem for Swift to solve.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><span class="gmail-"><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">Normalization is not lossless and cannot be reversed. Finally, if I want to use normalization form D (NFD), your proposal</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">would make it impossible, because (IIUC) serial NFC + NFD normalization can produce different output than NFD normalization alone.</div></div></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Why would you want to do this/care about this? I.e. what is the use case?</div></div></div></blockquote><div class=""><br class=""></div><div class="">Use cases for NFD include searching, where you'd find substrings considered "compatible." For instance, the fi ligature is considered compatible with the letters f and i, but they are not equal. If you've ever successfully searched for a word like "finance" in a PDF document that's been typeset with ligatures, you've benefited from NFD. Roughly speaking (IIUC), the difference between searching NFC-normalized strings and NFD-normalized strings is analogous to the difference between a case-sensitive and a case-insensitive search. Therefore, given a string x, it's sometimes important to be able to obtain NFD(x). If every string x is now automatically NFC(x), then the best one can do is NFD(NFC(x)), which is not guaranteed equal to NFD(x) even with canonical comparison (i.e. NFC(NFD(NFC(x))) != NFC(NFD(x)) for all x).</div></div></div></div></div></blockquote><div><br class=""></div><div>There are issues here related to String design. For instance, one could make an argument that such searching is really only interesting for a "Text" use case which is different from a String use case. That being said, I don't want to argue about this here since we are hijacking this thread ; ).</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""> <br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><div class=""></div><div class="">As an aside, I am not formally proposing this. I am just discussing potential opportunities for optimization given that we would need (as apart of this proposal) to add knowledge of unicode to the compiler which would allow for compile time transformations.</div></div></div></blockquote><div class=""><br class=""></div><div class="">I'd be interested to know what performance gains you're envisioning with such an optimization of constant strings at compile time.</div></div></div></div></div></blockquote><div><br class=""></div><div>I would have to measure such wins to say anything concrete. Algorithmically one would be able to avoid normalization during common unicode operations when you know you are using constant strings. Even though this may provide a runtime win, the major win from teaching the compiler about unicode would be in terms of applying unicode operations such as encoding/decoding to constant strings.</div><div><br class=""></div><div>That being said, this is not the proposal that is being discussed here or even being proposed here. [i.e. lets stop hijacking this thread ; )]</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><span class="gmail-"><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><span class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">On Thu, Sep 22, 2016 at 6:10 PM, Michael Gottesman<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:mgottesman@apple.com" target="_blank" class="">mgottesman@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><span class=""><br class="">> On Sep 22, 2016, at 10:50 AM, Joe Groff via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank" class="">swift-evolution@swift.org</a>> wrote:<br class="">><br class="">><br class="">>> On Jul 26, 2016, at 12:26 PM, Xiaodi Wu via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank" class="">swift-evolution@swift.org</a>> wrote:<br class="">>><br class="">>> +1. Even if it's too late for Swift 3, though, I'd argue that it's highly unlikely to be code-breaking in practice. Any existing code that would get tripped up by this normalization is arguably broken already.<br class="">><br class="">> I'm inclined to agree. To be paranoid about perfect compatibility, we could conceivably allow existing code with differently-normalized identifiers with a warning based on Swift version, but it's probably not worth it. It'd be interesting to data-mine Github or the iOS Swift Playgrounds app and see if this breaks any Swift 3 code in practice.<br class=""><br class=""></span>As an additional interesting point here, we could in general normalize unicode strings. This could potentially reduce the size of unicode characters or allow us to constant propagate certain unicode algorithms in the optimizer.<br class=""><br class="">><br class="">> -Joe<br class=""><div class=""><div class="">> ______________________________<wbr class="">_________________<br class="">> swift-evolution mailing list<br class="">><span class="Apple-converted-space"> </span><a href="mailto:swift-evolution@swift.org" target="_blank" class="">swift-evolution@swift.org</a><br class="">><span class="Apple-converted-space"> </span><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank" class="">https://lists.swift.org/mailma<wbr class="">n/listinfo/swift-evolution</a></div></div></blockquote></div></div></div></div></blockquote></span></div></div></blockquote></div></div></div></div></blockquote></span></div></div></blockquote></div></div></div></div></blockquote></div><br class=""></body></html>