<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 19, 2015, at 7:59 PM, Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" class="">gribozavr@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote">On Fri, Dec 18, 2015 at 1:47 PM, Paul Cantrell via swift-evolution<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:swift-evolution@swift.org" target="_blank" class="">swift-evolution@swift.org</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class="">I was quite surprised to learn that it’s possible to create Swift strings that do not contain things other than valid Unicode characters. Is it feasible to guarantee that this cannot happen?</div><div class=""><br class=""></div><div class="">String.init(bytes:encoding:) is failable, and does in fact validate that the given bytes are decodable with the given encoding in most circumstances:</div><div class=""><br class=""></div><div class=""><div style="color: rgb(102, 139, 73); font-family: Menlo; font-size: 10.5px; margin: 0px; line-height: normal;" class=""><span style="" class=""> </span>// Returns nil</div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo; color: rgb(88, 126, 168);" class=""><span style="" class=""> </span>String<span style="" class="">(</span></div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo;" class=""> <span class="Apple-converted-space"> </span>bytes: [<span style="color: rgb(50, 62, 125);" class="">0xD8</span>,<span class="Apple-converted-space"> </span><span style="color: rgb(50, 62, 125);" class="">0x00</span>]<span class="Apple-converted-space"> </span><span style="color: rgb(50, 62, 125);" class="">as</span><span class="Apple-converted-space"> </span>[<span style="color: rgb(88, 126, 168);" class="">UInt8</span>],</div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo;" class=""> <span class="Apple-converted-space"> </span>encoding:<span class="Apple-converted-space"> </span><span style="color: rgb(88, 126, 168);" class="">NSUTF8StringEncoding</span>)</div></div><div class=""><br class=""></div><div class="">However, that initializer does<span class="Apple-converted-space"> </span><i class="">not</i> reject invalid surrogate characters in UTF-16:</div><div class=""><br class=""></div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo; color: rgb(102, 139, 73);" class=""><span style="" class=""> </span>// Succeeds (wat?!)</div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo;" class=""> <span class="Apple-converted-space"> </span><span style="color: rgb(50, 62, 125);" class="">let</span><span class="Apple-converted-space"> </span>bogusStr =<span class="Apple-converted-space"> </span><span style="color: rgb(88, 126, 168);" class="">String</span>(</div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo;" class=""> <span class="Apple-converted-space"> </span>bytes: [<span style="color: rgb(50, 62, 125);" class="">0xD8</span>,<span class="Apple-converted-space"> </span><span style="color: rgb(50, 62, 125);" class="">0x00</span>]<span class="Apple-converted-space"> </span><span style="color: rgb(50, 62, 125);" class="">as</span><span class="Apple-converted-space"> </span>[<span style="color: rgb(88, 126, 168);" class="">UInt8</span>],</div><div style="margin: 0px; font-size: 10.5px; line-height: normal; font-family: Menlo; color: rgb(88, 126, 168);" class=""><span style="" class=""> <span class="Apple-converted-space"> </span>encoding:<span class="Apple-converted-space"> </span></span>NSUTF16BigEndianStringEncoding<span style="" class="">)!</span></div></div></blockquote><div class=""><br class=""></div><div class="">Adding this would be a useful guarantee, I support this. The current behavior looks inconsistent to me. OTOH, the current behavior of String(bytes:encoding:) mirrors the behavior of the NSString method, so this would create inconsistency. But I think the extra guarantee is worth it.</div><div class=""><br class=""></div><div class="">Tony, what do you think?</div><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div><div>NSString deals with this issue more on the ‘get’ side. For example, CFStringGetBytes has a ‘lossByte’ for use in replacement when the requested encoding cannot represent something stored by the receiver string. Also, the abstract NSString interface can be extended to add additional encodings (which is why the string encoding values are not an enumeration).</div><div><br class=""></div><div>- Tony</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">Dmitri</div></div><div class=""><br class=""></div>--<span class="Apple-converted-space"> </span><br class=""><div class="gmail_signature">main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if<br class="">(j){printf("%d\n",i);}}} /*Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" target="_blank" class="">gribozavr@gmail.com</a>>*/</div></div></div></div></blockquote></div><div class=""><br class=""></div></body></html>