<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Dec 18, 2015 at 1:47 PM, Paul Cantrell via swift-evolution <span dir="ltr"><<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>I was quite surprised to learn that it’s possible to create Swift strings that do not contain things other than valid Unicode characters. Is it feasible to guarantee that this cannot happen?</div><div><br></div><div>String.init(bytes:encoding:) is failable, and does in fact validate that the given bytes are decodable with the given encoding in most circumstances:</div><div><br></div><div><div style="color:rgb(102,139,73);font-family:Menlo;font-size:10.5px;margin:0px;line-height:normal"><span style="color:rgb(0,0,0)"> </span>// Returns nil</div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo;color:rgb(88,126,168)"><span style="color:#000000"> </span>String<span style="color:#000000">(</span></div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo"> bytes: [<span style="color:#323e7d">0xD8</span>, <span style="color:#323e7d">0x00</span>] <span style="color:#323e7d">as</span> [<span style="color:#587ea8">UInt8</span>],</div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo"> encoding: <span style="color:#587ea8">NSUTF8StringEncoding</span>)</div></div><div><br></div><div>However, that initializer does <i>not</i> reject invalid surrogate characters in UTF-16:</div><div><br></div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo;color:rgb(102,139,73)"><span style="color:#000000"> </span>// Succeeds (wat?!)</div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo"> <span style="color:#323e7d">let</span> bogusStr = <span style="color:#587ea8">String</span>(</div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo"> bytes: [<span style="color:#323e7d">0xD8</span>, <span style="color:#323e7d">0x00</span>] <span style="color:#323e7d">as</span> [<span style="color:#587ea8">UInt8</span>],</div><div style="margin:0px;font-size:10.5px;line-height:normal;font-family:Menlo;color:rgb(88,126,168)"><span style="color:#000000"> encoding: </span>NSUTF16BigEndianStringEncoding<span style="color:#000000">)!</span></div></div></blockquote><div><br></div><div>Adding this would be a useful guarantee, I support this. The current behavior looks inconsistent to me. OTOH, the current behavior of String(bytes:encoding:) mirrors the behavior of the NSString method, so this would create inconsistency. But I think the extra guarantee is worth it.</div><div><br></div><div>Tony, what do you think?</div><div><br></div><div>Dmitri</div></div><div><br></div>-- <br><div class="gmail_signature">main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if<br>(j){printf("%d\n",i);}}} /*Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" target="_blank">gribozavr@gmail.com</a>>*/</div>
</div></div>