[swift-evolution] Strings in Swift 4

Mon Feb 6 22:42:57 CST 2017

> On 6 Feb 2017, at 19:29, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org> wrote:
> 
>> 
>> On 6 Feb 2017, at 19:10, David Waite <david at alkaline-solutions.com <mailto:david at alkaline-solutions.com>> wrote:
>> 
>>> 
>>> On Feb 6, 2017, at 10:26 AM, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> 
>>> Hi Dave,
>>> Oops! yes, you’re right!
>>> I did read again more thoroughly about Unicode 
>>> and how Unicode is handled within Swift...
>>> -should have done that before I write something- sorry.  
>>> 
>>> Nevertheless: 
>>> 
>>> How about this solution:  (if I am not making other omissions in my thinking again) 
>>> -Store the string as a collection of fixed-width 32 bit UTF-32 characters anyway.
>>> -however, if the Unicode character is a grapheme cluster (2..n Unicode characters),then 
>>> store a pointer to a hidden child string containing the actual grapheme cluster, like so:
>>> 
>>> 1: [UTF32, UTF32, UTF32, 1pointer,  UTF32, UTF32, 1pointer, UTF32, UTF32]
>>>                                                |                                          |
>>> 2:                               [UTF32, UTF32]                  [UTF32, UTF32, UTF32, ...]
>>> 
>>> whereby (1) is aString as seen by the programmer.
>>> and (2)  are hidden child strings, each containing a grapheme cluster. 
>> 
>> The random access would require a uniform layout, so a pointer and scalar would need to be the same size. The above would work with a 32 bit platform with a tagged pointer, but would require a 64-bit slot for pointers on 64-bit systems like macOS and iOS.
>> 
> Yeah, I know that,  but the “grapheme cluster pool” I am imagining 
> could be allocated at a certain predefined base address, 
> whereby the pointer I am referring to is just an offset from this base address. 
> If so, an address space of  2^30  (1,073,741,824) 1 GB, will be available,
> which is more than sufficient for just storing unique grapheme clusters..    
> (of course, not taking in account other allocations and app limitations) 

When it comes to fast access what’s most important is cache locality. DRAM is like 200x slower than L2 cache. Looping through some contiguous 16-bit integers is always going to beat the pants out of derefencing pointers.

>   
>> Today when I need to do random access into a string, I convert it to an Array<Character>. Hardly efficient memory-wise, but efficient enough for random access.
>> 
> As a programmer. I just want to use String as-is but with  direct subscripting like str[12..<34]
> and, if possible also with open range like so: str[12…]   
> implemented natively in Swift. 
> 
> Kind Regards
> TedvG
> www.tedvg.com <http://www.tedvg.com/>
> www.ravelnotes.com <http://www.ravelnotes.com/>
>  
>> -DW
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>

It’s quite rare that you need to grab arbitrary parts of a String without knowing what is inside it. If you’re saying str[12..<34] - why 12, and why 34? Is 12 the length of some substring you know from earlier? In that case, you could find out how many CodeUnits it had, and use that information instead.

The new model will give you some form of efficient “random” access; the catch is that it’s not totally random. Looking for the next character boundary is necessarily linear, so the trick for large strings (>16K) is to make sure you remember the CodeUnit offsets of important character boundaries.

- Karl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170207/4e9d20c3/attachment.html>