[swift-evolution] InternalString class for easy String manipulation

Shawn Erickson shawnce at gmail.com
Wed Aug 17 14:20:42 CDT 2016


As stated earlier it is 2016, I think the baseline should be robust Unicode
support and what we have in Swift is actually a fairly good way of dealing
with it IMHO. I think new to development folks should have this as their
baseline as well... not that we shouldn't make it as easy to work with as
possible.

-Shawn

On Wed, Aug 17, 2016 at 12:15 PM Kenny Leung via swift-evolution <
swift-evolution at swift.org> wrote:

> It seems to me that UTF-8 is the best choice to encode strings in English
> and English-like character sets for storage, but it’s not clear that it is
> the most useful or performant internal representation for working with
> strings. In my opinion, conflating the preferred storage format and the
> best internal representation is not the proper thing to do. Picking the
> right internal storage format should be evaluated based on its own
> criteria. Even as an experienced programmer, I assert that the most useful
> indexing system is glyph based.
>
> In Félix’s case, I would expect to have to ask for a mail-friendly
> representation of his name, just like you have to ask for a
> filesystem-friendly representation of a filename regardless of what the
> internal representation is. Just because you are using UTF-8 as the
> internal format, it does not mean that universal support is guaranteed.
>
> In response to this statement: “Optimizing developer experience for
> beginning developers is just going to lead to software that screws…”, the
> current system trips up not only beginning developers, but is different
> from pretty much every programming language in my experience.
>
> -Kenny
>
>
> > On Aug 17, 2016, at 11:48 AM, Zach Waldowski via swift-evolution <
> swift-evolution at swift.org> wrote:
> >
> > It's 2016, "the thing people would most commonly expect"
> > impossible-to-screw-up Unicode support that's performance. Optimizing
> > developer experience for beginning developers is just going to lead to
> > software that screws up in situations the developer doesn't anticipate,
> > as F+¬lix notes above.
> >
> > Zachary
> >
> > On Wed, Aug 17, 2016, at 09:40 AM, Kenny Leung via swift-evolution
> > wrote:
> >> I understand that the most friendly approach may not be the most
> >> efficient, but that’s not what I’m pushing for. I’m pushing for "does
> the
> >> thing people would most commonly expect”. Take a first-time programmer
> >> who reads any (human) language, and that is what they would expect.
> >>
> >> Why couldn’t String’s internal storage format be glyph-based? If I were,
> >> say, writing a text editor, it would certainly be the easiest and most
> >> efficient format to work in.
> >>
> >> -Kenny
> >>
> >>
> >>> On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felixcca at yahoo.ca> wrote:
> >>>
> >>> The major problem with this approach is that visual glyphs themselves
> have one level of variable-length encoding, and they sit on top of another
> variable-length encoding used to represent the Unicode characters
> (Swift-native Strings are currently encoded as UTF-8). For instance, the
> visual glyph 🇺🇸 is the the result of putting side-by-side the Unicode
> characters 🇺 and  🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL
> INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 4
> bytes each. A design in which you can "just write" string[4544] hides the
> fact that indexing is a linear-time operation that needs to recompose UTF-8
> characters and then recompose visual glyphs on top of that.
> >>>
> >>> Generally speaking, I *think* that I agree that human-geared "long
> string" on which you probably won't need random access, and machine-geared
> smaller strings that encode a command, could benefit from not being
> considered the same fundamental thing. However, I'm also afraid that this
> will end with more applications and websites that think that first names
> only contain 7-bit-clean characters in the A-Z range. (I live in the US and
> I can attest that this is still very common.)
> >>>
> >>> You could make a point too that better facilities to parse strings
> would probably address this issue.
> >>>
> >>> Félix
> >>>
> >>>> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <
> swift-evolution at swift.org> a écrit :
> >>>>
> >>>> I agree with both points of view. I think we need to bring back
> subscripting on strings which does the thing people would most commonly
> expect.
> >>>>
> >>>> I would say that the subscripts indexes should correspond to a visual
> glyph. This seems reasonable to me for most character sets like Roman,
> Cyrillic, Chinese. There is some doubt in my mind for things like
> subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi
> or Thai.
> >>>>
> >>>> -Kenny
> >>>>
> >>>>
> >>>>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <
> swift-evolution at swift.org> wrote:
> >>>>>
> >>>>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <
> swift-evolution at swift.org> wrote:
> >>>>> Back in Swift 1.0, subscripting a String was easy, you could just
> use subscripting in a very Python like way. But now, things are a bit more
> complicated. I recognize why we need syntax like
> str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
> things hard on beginners. If one of Swift's goals is to make it a great
> first language, this syntax fights that. Imagine having to explain Unicode
> and character size to an 8 year old. This is doubly problematic because
> String manipulation is one of the first things new coders might want to do.
> >>>>>
> >>>>> What about having an InternalString subclass that only supports one
> encoding, allowing it to be subscripted with Ints? The idea is that an
> InternalString is for Strings that are more or less hard coded into the
> app. Dictionary keys, enum raw values, that kind of stuff. This also has
> the added benefit of forcing the programmer to think about what the String
> is being used for. Is it user facing? Or is it just for internal use? And
> of course, it makes code dealing with String manipulation much more concise
> and readable.
> >>>>>
> >>>>> It follows that something like this would need to be entered as a
> literal to make it as easy as using String. One way would be to make all
> String literals InternalStrings, but that sounds far too drastic. Maybe
> appending an exclamation point like "this"! Or even just wrapping the whole
> thing in exclamation marks like !"this"! Of course, we could go old school
> and write it like @"this" …That last one is a joke.
> >>>>>
> >>>>> I'll be the first to admit I'm way in over my head here, so I'm very
> open to suggestions and criticism. Thanks!
> >>>>>
> >>>>> I can sympathize, but this is tricky.
> >>>>>
> >>>>> Fundamentally, if it's going to be a learning and teaching issue,
> then this "easy" string should be the default. That is to say, if I write
> `var a = "Hello, world!"`, then `a` should be inferred to be of type
> InternalString or EasyString, whatever you want to call it.
> >>>>>
> >>>>> But, we also want Swift to support Unicode by default, and we want
> that support to do things The Right Way(TM) by default. In other words, a
> user should not have to reach for a special type in order to handle
> arbitrary strings correctly, and I should be able to reassign `a = "你好"`
> and have things work as expected. So, we also can't have the "easy" string
> type be the default...
> >>>>>
> >>>>> I can't think of a way to square that circle.
> >>>>>
> >>>>>
> >>>>> Sent from my iPad
> >>>>>
> >>>>> _______________________________________________
> >>>>> swift-evolution mailing list
> >>>>> swift-evolution at swift.org
> >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> swift-evolution mailing list
> >>>>> swift-evolution at swift.org
> >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> >>>>
> >>>> _______________________________________________
> >>>> swift-evolution mailing list
> >>>> swift-evolution at swift.org
> >>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> >>>
> >>
> >> _______________________________________________
> >> swift-evolution mailing list
> >> swift-evolution at swift.org
> >> https://lists.swift.org/mailman/listinfo/swift-evolution
> > _______________________________________________
> > swift-evolution mailing list
> > swift-evolution at swift.org
> > https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160817/bf7d2c39/attachment.html>


More information about the swift-evolution mailing list