[swift-evolution] Strings in Swift 4

Ted F.A. van Gaalen tedvgiosdev at gmail.com
Wed Feb 22 17:40:03 CST 2017


Thank you Michael,
 I did that already in this extension: (as written before) 
 
extension String
{
    var count: Int
        {
        get
        {
            return self.characters.count
        }
    }

// properties in extensions not possible
// var ar =  Array(self.characters) 

    subscript (n: Int) -> String
    {
        return String(Array(self.characters)[n])
    }
    
    subscript (r: Range<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }
    
    subscript (r: ClosedRange<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }
}

but this is not so efficient, because for each subscript invocation
the Character array must be built again: ( If not cached within String) 
I assume, it must be reloaded each time because one cannot create create new
properties in extensions (why not?) like a Character Array as in the above comment
  

> On 22 Feb 2017, at 19:43, Michael Ilseman <milseman at apple.com> wrote:
> 
> Given that the behavior you desire is literally a few key strokes away (see below), it would be unfortunate to pessimize the internal representation of Strings for every application. This would destroy the applicability of the Swift standard library to entire areas of computing such as application development for mobile devices (Swift's current largest niche). The idea of abstraction is that you can provide a high-level view of things stored at a lower-level in accordance with sensible higher-level semantics and expectations. If you want random access, then you can eagerly project the characters (see below). This is consistent with the standard library’s preference for lazy sequences when providing a eager one would result in a large up-front cost that might be avoidable otherwise.
> 
mostly true.
> Here’s playground code that gives you what you’re requesting, by doing an eager projection (rather than a lazy one, which is the default):
> 
Your extension is more efficient than my subscript extension above, 
because the Character array is drawn once from the String, instead of that each
time the str.characters property is scanned again
@Dave :
 is that the case, or is the character view cached , so that it
doesn’t matter much if the characterView is retrieved frequently?  
> extension String {
>     var characterArray: [Character] {
>         return characters.map { $0 }
>     }
> }
> let str = "abcdefg\(UnicodeScalar(0x302)!)"
> let charArray = str.characterArray
> charArray[4] // results in "e"
> charArray[6] // results in “ĝ”

I would normally subclass String, but in Swift I can’t do this
because String is a struct, inheritance of structs is not
possible in Swift. 

@Dave:
Thanks for the explanation and the link (it’s been a long time
ago reading about pointers, normally I try to avoid these things like the plague..)  

Factor 8?  that's a big storage difference.. Currently still diving into Swift stdlib, 
maybe I’ll get some bright ideas there , but don’t count on it :o)  

However, for the String struct, I have another suggestion/solution/question if I may: 

If  String’s CharacterView is not cached (or is it?) to prevent repetitive regeneration,
but even then: 

What about having a (lazy)  Array<Character> property inside String?
which: 
      is normally nil and only created when parts of aString are
      accessed/changed        e.g. with subscription.
      will be nil again when String has changed. 
can also be disposed of (to nil or emptied) upon request: 
      str.disposeCharacterArray() 
   or maybe:
      str.compactString()  
      str.freeSpace()  

Although then available as a property like this:
      str.characterArray , 
normally one would not access this character array directly,
but rather implicitly with subscripting on the String itself, like str[n…m]. 
In that case, if it does not already exist, this character array inside String 
will be created and remains alive until aString disappears , changes, or 
the string’s character array is explicitly disposed.
(e.g. useful when many strings are involved, to free storage) 

in that way:
No unnecessary storage is allocated for Character arrays, 
but only when the need arises.  
There are no longer performance based restrictions for the programmer
to subscript strings directly. Hooray! 

Not only to *get* but also to *set*  substrings. 
(The latter would of course require String-inside 
processing of the Character array. updating the
in the String)

Furthermore, one could base nearly all
string handling like substring, replace, search, etc.
directly on this character array without the 
need to walk through the contiguous String storage
itself each time at runtime. 

  
Flexible! So one can do all this and more: 
     str[5] = “x”
     let s = str[5] 
     str[3…5] = “HAL”                           
     str[range] = str[range].reversed()  
     var s = str[10..<28]
    if str[p1..<p1+ length] ==  “Dakota” {…}
   notes[bar1..<bar1+6] = “EADGBE”  
    etc. 

   (try to do this with the existing string handling functions..)
   and also roll your own string handling functions directly
  based on subscripting. possibly in own extensions.
? 

In that way we can forget the imho  -sorry, excuse l’moi-  awkward and tedious constructions like: 
	str.substringWithRange(Range<String.Index>(start: str.startIndex, end: str.endIndex))
horrible, too much typing, can’t read these things, have to look them up each time..
? 
Kind Regards
TedvG
( I am Dutch and living in Germany (like being here but it doesn’t help my English much :o) )
www.tedvg.com <http://www.tedvg.com/>
www.ravelnotes.com <http://www.ravelnotes.com/>
 
    

> 
> Note that you get random access AND safety by operating at the Character level. If you operate at the unicode scalar value level instead, you might be splitting canonical combining sequences accidentally.
> 
> 
>> On Feb 22, 2017, at 7:56 AM, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> 
>> Hi Ben,
>> thank you, yes, I know all that by now. 
>> 
>> Have seen that one goes to great lengths to optimise, not only for storage but also for speed. But how far does this need to go?  In any case, optimisation should not be used
>> as an argument for restricting a PLs functionality that is to refrain from PL elements which are common and useful.?
>> 
>> I wouldn’t worry so much over storage (unless one wants to load a complete book into memory… in iOS, the average app is about 15-50 MB, String data is mostly a fraction of that. In macOS or similar I’d think it is even less significant…
>> 
>> I wonder how much performance and memory consumption would be different from the current contiguous memory implementation?  if a String is just is a plain row of (references to) Character (extended grapheme cluster) objects, Array<[Character>, which would simplify the basic logic and (sub)string handling significantly, because then one has direct access to the String’s elements directly, using the reasonably fast access methods of a Swift Collection/Array. 
>> 
>> I have experimented  with an alternative String struct based upon Array<Character>, seeing how easy it was to implement most popular string handling functions as one can work with the Character array directly. 
>> 
>> Currently at deep-dive-depth in the standard lib sources, especially String & Co.
>> 
>> Kind Regards
>> TedvG
>> 
>> 
>>> On 21 Feb 2017, at 01:31, Ben Cohen <ben_cohen at apple.com <mailto:ben_cohen at apple.com>> wrote:
>>> 
>>> Hi Ted,
>>> 
>>> While Character is the Element type for String, it would be unsuitable for a String’s implementation to actually use Character for storage. Character is fairly large (currently 9 bytes), very little of which is used for most values. For unusual graphemes that require more storage, it allocates more memory on the heap. By contrast, String’s actual storage is a buffer of 1- or 2-byte elements, and all graphemes (what we expose as Characters) are held in that contiguous memory no matter how many code points they comprise. When you iterate over the string, the graphemes are unpacked into a Character on the fly. This gives you an user interface of a collection that superficially appears to resemble [Character], but this does not mean that this would be a workable implementation.
>>> 
>>>> On Feb 20, 2017, at 12:59 PM, Ted F.A. van Gaalen <tedvgiosdev at gmail.com <mailto:tedvgiosdev at gmail.com>> wrote:
>>>> 
>>>> Hi Ben, Dave (you should not read this now, you’re on vacation :o)  & Others
>>>> 
>>>> As described in the Swift Standard Library API Reference:
>>>> 
>>>> The Character type represents a character made up of one or more Unicode scalar values, 
>>>> grouped by a Unicode boundary algorithm. Generally, a Character instance matches what 
>>>> the reader of a string will perceive as a single character. The number of visible characters is 
>>>> generally the most natural way to count the length of a string.
>>>> The smallest discrete unit we (app programmers) are mostly working with is this
>>>> perceived visible character, what else? 
>>>> 
>>>> If that is the case, my reasoning is, that Strings (could / should? ) be relatively simple, 
>>>> because most, if not all, complexity of Unicode is confined within the Character object and
>>>> completely hidden**  for the average application programmer, who normally only needs
>>>> to work with Strings which contains these visible Characters, right? 
>>>> It doesn’t then make no difference at all “what’ is in” the Character, (excellent implementation btw) 
>>>> (Unicode, ASCCII, EBCDIC, Elvish, KlingonIV, IntergalacticV.2, whatever)
>>>> because we rely in sublime oblivion for the visually representation of whatever is in
>>>> the Character on miraculous font processors hidden in the dark depths of the OS. 
>>>> 
>>>> Then, in this perspective, my question is: why is String not implemented as 
>>>> directly based upon an array [Character]  ? In that case one can refer to the Characters of the
>>>> String directly, not only for direct subscripting and other String functionality in an efficient way. 
>>>> (i do hava scope of independent Swift here, that is interaction with libraries should be 
>>>> solved by the compiler, so as not to be restricted by legacy ObjC etc. 
>>>> 
>>>> **   (expect if one needs to do e.g. access individual elements and/or compose graphics directly?
>>>>       but for  this purpose the Character’s properties are accessible) 
>>>> 
>>>> For the sake of convenience, based upon the above reasoning,  I now “emulate" this in 
>>>> a string extension, thereby ignoring the rare cases that a visible character could be based 
>>>> upon more than a single Character (extended grapheme cluster)  If that would occur, 
>>>> thye should be merged into one extended grapheme cluster, a single Character that is. 
>>>> 
>>>> //: Playground - implement direct subscripting using a Character array
>>>> // of course, when the String is defined as an array of Characters, directly
>>>> // accessible it would be more efficient as in these extension functions. 
>>>> extension String
>>>> {
>>>>     var count: Int
>>>>         {
>>>>         get
>>>>         {
>>>>             return self.characters.count
>>>>         }
>>>>     }
>>>> 
>>>>     subscript (n: Int) -> String
>>>>     {
>>>>         return String(Array(self.characters)[n])
>>>>     }
>>>>     
>>>>     subscript (r: Range<Int>) -> String
>>>>     {
>>>>         return String(Array(self.characters)[r])
>>>>     }
>>>>     
>>>>     subscript (r: ClosedRange<Int>) -> String
>>>>     {
>>>>         return String(Array(self.characters)[r])
>>>>     }
>>>> }
>>>> 
>>>> func test()
>>>> {
>>>>     let zoo = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
>>>>     print("zoo has \(zoo.count) characters (discrete extended graphemes):")
>>>>     for i in 0..<zoo.count
>>>>     {
>>>>         print(i,zoo[i],separator: "=", terminator:" ")
>>>>     }
>>>>     print("\n")
>>>>     print(zoo[0..<7])
>>>>     print(zoo[9..<16])
>>>>     print(zoo[18...26])
>>>>     print(zoo[29...39])
>>>>     print("images:" + zoo[6] + zoo[15] + zoo[26] + zoo[39])
>>>> }
>>>> 
>>>> test()
>>>> 
>>>> this works as intended  and generates the following output:  
>>>> 
>>>> zoo has 40 characters (discrete extended graphemes):
>>>> 0=K 1=o 2=a 3=l 4=a 5=  6=🐨 7=, 8=  9=S 10=n 11=a 12=i 13=l 14=  15=🐌 16=, 17=  
>>>> 18=P 19=e 20=n 21=g 22=u 23=i 24=n 25=  26=🐧 27=, 28=  29=D 30=r 31=o 32=m 
>>>> 33=e 34=d 35=a 36=r 37=y 38=  39=🐪 
>>>> 
>>>> Koala 🐨
>>>> Snail 🐌
>>>> Penguin 🐧
>>>> Dromedary 🐪
>>>> images:🐨🐌🐧🐪
>>>> 
>>>> I don’t know how (in) efficient this method is. 
>>>> but in many cases this is not so important as e.g. with numerical computation.
>>>> 
>>>> I still fail to understand why direct subscripting strings would be unnecessary,
>>>> and would like to see this built-in in Swift asap. 
>>>> 
>>>> Btw, I do share the concern as expressed by Rien regarding the increasing complexity of the language.
>>>> 
>>>> Kind Regards, 
>>>> 
>>>> TedvG
>>>> 
>>>> 
>>>>  
>>> 
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170223/3c33d7a0/attachment.html>


More information about the swift-evolution mailing list