[swift-evolution] InternalString class for easy String manipulation
Kenny Leung
kenny_leung at pobox.com
Wed Aug 17 11:40:01 CDT 2016
I understand that the most friendly approach may not be the most efficient, but that’s not what I’m pushing for. I’m pushing for "does the thing people would most commonly expect”. Take a first-time programmer who reads any (human) language, and that is what they would expect.
Why couldn’t String’s internal storage format be glyph-based? If I were, say, writing a text editor, it would certainly be the easiest and most efficient format to work in.
-Kenny
> On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felixcca at yahoo.ca> wrote:
>
> The major problem with this approach is that visual glyphs themselves have one level of variable-length encoding, and they sit on top of another variable-length encoding used to represent the Unicode characters (Swift-native Strings are currently encoded as UTF-8). For instance, the visual glyph 🇺🇸 is the the result of putting side-by-side the Unicode characters 🇺 and 🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 4 bytes each. A design in which you can "just write" string[4544] hides the fact that indexing is a linear-time operation that needs to recompose UTF-8 characters and then recompose visual glyphs on top of that.
>
> Generally speaking, I *think* that I agree that human-geared "long string" on which you probably won't need random access, and machine-geared smaller strings that encode a command, could benefit from not being considered the same fundamental thing. However, I'm also afraid that this will end with more applications and websites that think that first names only contain 7-bit-clean characters in the A-Z range. (I live in the US and I can attest that this is still very common.)
>
> You could make a point too that better facilities to parse strings would probably address this issue.
>
> Félix
>
>> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <swift-evolution at swift.org> a écrit :
>>
>> I agree with both points of view. I think we need to bring back subscripting on strings which does the thing people would most commonly expect.
>>
>> I would say that the subscripts indexes should correspond to a visual glyph. This seems reasonable to me for most character sets like Roman, Cyrillic, Chinese. There is some doubt in my mind for things like subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi or Thai.
>>
>> -Kenny
>>
>>
>>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
>>>
>>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <swift-evolution at swift.org> wrote:
>>> Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.
>>>
>>> What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.
>>>
>>> It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.
>>>
>>> I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!
>>>
>>> I can sympathize, but this is tricky.
>>>
>>> Fundamentally, if it's going to be a learning and teaching issue, then this "easy" string should be the default. That is to say, if I write `var a = "Hello, world!"`, then `a` should be inferred to be of type InternalString or EasyString, whatever you want to call it.
>>>
>>> But, we also want Swift to support Unicode by default, and we want that support to do things The Right Way(TM) by default. In other words, a user should not have to reach for a special type in order to handle arbitrary strings correctly, and I should be able to reassign `a = "你好"` and have things work as expected. So, we also can't have the "easy" string type be the default...
>>>
>>> I can't think of a way to square that circle.
>>>
>>>
>>> Sent from my iPad
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
More information about the swift-evolution
mailing list