<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 6 Feb 2017, at 19:29, Ted F.A. van Gaalen via swift-evolution <<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" class=""><div class=""><br class="Apple-interchange-newline">On 6 Feb 2017, at 19:10, David Waite <<a href="mailto:david@alkaline-solutions.com" class="">david@alkaline-solutions.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><blockquote type="cite" class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="Apple-interchange-newline">On Feb 6, 2017, at 10:26 AM, Ted F.A. van Gaalen via swift-evolution <<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>> wrote:<br class=""><br class="">Hi Dave,<br class="">Oops! yes, you’re right!<br class="">I did read again more thoroughly about Unicode<span class="Apple-converted-space"> </span><br class="">and how Unicode is handled within Swift...<br class="">-should have done that before I write something- sorry. <br class=""><br class="">Nevertheless:<span class="Apple-converted-space"> </span><br class=""><br class="">How about this solution: (if I am not making other omissions in my thinking again)<span class="Apple-converted-space"> </span><br class="">-Store the string as a collection of fixed-width 32 bit UTF-32 characters anyway.<br class="">-however, if the Unicode character is a grapheme cluster (2..n Unicode characters),then<span class="Apple-converted-space"> </span><br class="">store a pointer to a hidden child string containing the actual grapheme cluster, like so:<br class=""><br class="">1: [UTF32, UTF32, UTF32, 1pointer, UTF32, UTF32, 1pointer, UTF32, UTF32]<br class=""> | |<br class="">2: [UTF32, UTF32] [UTF32, UTF32, UTF32, ...]<br class=""><br class="">whereby (1) is aString as seen by the programmer.<br class="">and (2) are hidden child strings, each containing a grapheme cluster.<span class="Apple-converted-space"> </span><br class=""></blockquote><br class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">The random access would require a uniform layout, so a pointer and scalar would need to be the same size. The above would work with a 32 bit platform with a tagged pointer, but would require a 64-bit slot for pointers on 64-bit systems like macOS and iOS.</span><br class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></div></blockquote>Yeah, I know that, but the “grapheme cluster pool” I am imagining </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">could be allocated at a certain predefined base address, </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">whereby the pointer I am referring to is just an offset from this base address. </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">If so, an address space of 2^30 (1,073,741,824) 1 GB, will be available,</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">which is more than sufficient for just storing unique grapheme clusters.. </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">(of course, not taking in account other allocations and app limitations) </div></div></blockquote><div><br class=""></div><div>When it comes to fast access what’s most important is cache locality. DRAM is like 200x slower than L2 cache. Looping through some contiguous 16-bit integers is always going to beat the pants out of derefencing pointers.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""> <br class=""><blockquote type="cite" class=""><div class=""><span class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Today when I need to do random access into a string, I convert it to an Array<Character>. Hardly efficient memory-wise, but efficient enough for random access.</span><br class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></div></blockquote>As a programmer. I just want to use String as-is but with direct subscripting like str[12..<34]</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">and, if possible also with open range like so: str[12…] </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">implemented natively in Swift. </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Kind Regards</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">TedvG</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="http://www.tedvg.com/" class="">www.tedvg.com</a></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="http://www.ravelnotes.com/" class="">www.ravelnotes.com</a></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""> </div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" class=""><div class=""><span class="" style="font-family: Avenir-Medium; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">-DW</span></div></blockquote></div><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">swift-evolution mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="mailto:swift-evolution@swift.org" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">swift-evolution@swift.org</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" class="">https://lists.swift.org/mailman/listinfo/swift-evolution</a></div></blockquote><br class=""></div><div><br class=""></div><div><br class=""></div><div>It’s quite rare that you need to grab arbitrary parts of a String without knowing what is inside it. If you’re saying str[12..<34] - why 12, and why 34? Is 12 the length of some substring you know from earlier? In that case, you could find out how many CodeUnits it had, and use that information instead.</div><div><br class=""></div><div>The new model will give you some form of efficient “random” access; the catch is that it’s not totally random. Looking for the next character boundary is necessarily linear, so the trick for large strings (>16K) is to make sure you remember the CodeUnit offsets of important character boundaries.</div><div><br class=""></div><div>- Karl</div><br class=""></body></html>