[swift-evolution] InternalString class for easy String manipulation

Shawn Erickson shawnce at gmail.com
Wed Aug 17 14:36:28 CDT 2016


I like a "view" based system when looking at a Unicode string. It lets you
pick the view of string - defining how it is indexed - based on your needs.
A view could be indexed by a human facing glyph, a particular Unicode
encoding style, a decompose style, etc.

I think that is powerful, useful, and exposes the real complexity in a
manageable and functional way.

In many domains you would never need to care about indexing across a view
or even using a view to work with a string.
On Wed, Aug 17, 2016 at 12:27 PM Kenny Leung via swift-evolution <
swift-evolution at swift.org> wrote:

> >
> > On Aug 17, 2016, at 12:20 PM, Shawn Erickson <shawnce at gmail.com> wrote:
> >
> > As stated earlier it is 2016
>
> I don’t like the tone attached to this statement.
>
> > I think the baseline should be robust Unicode support
>
> I don’t understand how anything I have pushed for would compromise robust
> Unicode support.
>
> > and what we have in Swift is actually a fairly good way of dealing with
> it IMHO. I think new to development folks should have this as their
> baseline as well…
>
> > not that we shouldn't make it as easy to work with as possible.
>
> Regardless of internal representation, wouldn’t this be a glyph-based
> indexing system?
>
> -Kenny
>
>
> >
> > -Shawn
> >
> > On Wed, Aug 17, 2016 at 12:15 PM Kenny Leung via swift-evolution <
> swift-evolution at swift.org> wrote:
> > It seems to me that UTF-8 is the best choice to encode strings in
> English and English-like character sets for storage, but it’s not clear
> that it is the most useful or performant internal representation for
> working with strings. In my opinion, conflating the preferred storage
> format and the best internal representation is not the proper thing to do.
> Picking the right internal storage format should be evaluated based on its
> own criteria. Even as an experienced programmer, I assert that the most
> useful indexing system is glyph based.
> >
> > In Félix’s case, I would expect to have to ask for a mail-friendly
> representation of his name, just like you have to ask for a
> filesystem-friendly representation of a filename regardless of what the
> internal representation is. Just because you are using UTF-8 as the
> internal format, it does not mean that universal support is guaranteed.
> >
> > In response to this statement: “Optimizing developer experience for
> beginning developers is just going to lead to software that screws…”, the
> current system trips up not only beginning developers, but is different
> from pretty much every programming language in my experience.
> >
> > -Kenny
> >
> >
> > > On Aug 17, 2016, at 11:48 AM, Zach Waldowski via swift-evolution <
> swift-evolution at swift.org> wrote:
> > >
> > > It's 2016, "the thing people would most commonly expect"
> > > impossible-to-screw-up Unicode support that's performance. Optimizing
> > > developer experience for beginning developers is just going to lead to
> > > software that screws up in situations the developer doesn't anticipate,
> > > as F+¬lix notes above.
> > >
> > > Zachary
> > >
> > > On Wed, Aug 17, 2016, at 09:40 AM, Kenny Leung via swift-evolution
> > > wrote:
> > >> I understand that the most friendly approach may not be the most
> > >> efficient, but that’s not what I’m pushing for. I’m pushing for "does
> the
> > >> thing people would most commonly expect”. Take a first-time programmer
> > >> who reads any (human) language, and that is what they would expect.
> > >>
> > >> Why couldn’t String’s internal storage format be glyph-based? If I
> were,
> > >> say, writing a text editor, it would certainly be the easiest and most
> > >> efficient format to work in.
> > >>
> > >> -Kenny
> > >>
> > >>
> > >>> On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felixcca at yahoo.ca>
> wrote:
> > >>>
> > >>> The major problem with this approach is that visual glyphs
> themselves have one level of variable-length encoding, and they sit on top
> of another variable-length encoding used to represent the Unicode
> characters (Swift-native Strings are currently encoded as UTF-8). For
> instance, the visual glyph 🇺🇸 is the the result of putting side-by-side
> the Unicode characters 🇺 and  🇸("REGIONAL INDICATOR SYMBOL LETTER U" and
> "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as
> UTF-8 using 4 bytes each. A design in which you can "just write"
> string[4544] hides the fact that indexing is a linear-time operation that
> needs to recompose UTF-8 characters and then recompose visual glyphs on top
> of that.
> > >>>
> > >>> Generally speaking, I *think* that I agree that human-geared "long
> string" on which you probably won't need random access, and machine-geared
> smaller strings that encode a command, could benefit from not being
> considered the same fundamental thing. However, I'm also afraid that this
> will end with more applications and websites that think that first names
> only contain 7-bit-clean characters in the A-Z range. (I live in the US and
> I can attest that this is still very common.)
> > >>>
> > >>> You could make a point too that better facilities to parse strings
> would probably address this issue.
> > >>>
> > >>> Félix
> > >>>
> > >>>> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <
> swift-evolution at swift.org> a écrit :
> > >>>>
> > >>>> I agree with both points of view. I think we need to bring back
> subscripting on strings which does the thing people would most commonly
> expect.
> > >>>>
> > >>>> I would say that the subscripts indexes should correspond to a
> visual glyph. This seems reasonable to me for most character sets like
> Roman, Cyrillic, Chinese. There is some doubt in my mind for things like
> subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi
> or Thai.
> > >>>>
> > >>>> -Kenny
> > >>>>
> > >>>>
> > >>>>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <
> swift-evolution at swift.org> wrote:
> > >>>>>
> > >>>>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via
> swift-evolution <swift-evolution at swift.org> wrote:
> > >>>>> Back in Swift 1.0, subscripting a String was easy, you could just
> use subscripting in a very Python like way. But now, things are a bit more
> complicated. I recognize why we need syntax like
> str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
> things hard on beginners. If one of Swift's goals is to make it a great
> first language, this syntax fights that. Imagine having to explain Unicode
> and character size to an 8 year old. This is doubly problematic because
> String manipulation is one of the first things new coders might want to do.
> > >>>>>
> > >>>>> What about having an InternalString subclass that only supports
> one encoding, allowing it to be subscripted with Ints? The idea is that an
> InternalString is for Strings that are more or less hard coded into the
> app. Dictionary keys, enum raw values, that kind of stuff. This also has
> the added benefit of forcing the programmer to think about what the String
> is being used for. Is it user facing? Or is it just for internal use? And
> of course, it makes code dealing with String manipulation much more concise
> and readable.
> > >>>>>
> > >>>>> It follows that something like this would need to be entered as a
> literal to make it as easy as using String. One way would be to make all
> String literals InternalStrings, but that sounds far too drastic. Maybe
> appending an exclamation point like "this"! Or even just wrapping the whole
> thing in exclamation marks like !"this"! Of course, we could go old school
> and write it like @"this" …That last one is a joke.
> > >>>>>
> > >>>>> I'll be the first to admit I'm way in over my head here, so I'm
> very open to suggestions and criticism. Thanks!
> > >>>>>
> > >>>>> I can sympathize, but this is tricky.
> > >>>>>
> > >>>>> Fundamentally, if it's going to be a learning and teaching issue,
> then this "easy" string should be the default. That is to say, if I write
> `var a = "Hello, world!"`, then `a` should be inferred to be of type
> InternalString or EasyString, whatever you want to call it.
> > >>>>>
> > >>>>> But, we also want Swift to support Unicode by default, and we want
> that support to do things The Right Way(TM) by default. In other words, a
> user should not have to reach for a special type in order to handle
> arbitrary strings correctly, and I should be able to reassign `a = "你好"`
> and have things work as expected. So, we also can't have the "easy" string
> type be the default...
> > >>>>>
> > >>>>> I can't think of a way to square that circle.
> > >>>>>
> > >>>>>
> > >>>>> Sent from my iPad
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> swift-evolution mailing list
> > >>>>> swift-evolution at swift.org
> > >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> > >>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> swift-evolution mailing list
> > >>>>> swift-evolution at swift.org
> > >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> > >>>>
> > >>>> _______________________________________________
> > >>>> swift-evolution mailing list
> > >>>> swift-evolution at swift.org
> > >>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> > >>>
> > >>
> > >> _______________________________________________
> > >> swift-evolution mailing list
> > >> swift-evolution at swift.org
> > >> https://lists.swift.org/mailman/listinfo/swift-evolution
> > > _______________________________________________
> > > swift-evolution mailing list
> > > swift-evolution at swift.org
> > > https://lists.swift.org/mailman/listinfo/swift-evolution
> >
> > _______________________________________________
> > swift-evolution mailing list
> > swift-evolution at swift.org
> > https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160817/660e7ed6/attachment.html>


More information about the swift-evolution mailing list