[swift-evolution] Strings in Swift 4

Ted F.A. van Gaalen tedvgiosdev at gmail.com
Tue Feb 7 14:19:25 CST 2017


> On 7 Feb 2017, at 19:44, Dave Abrahams <dabrahams at apple.com> wrote:
> 
> 
> on Tue Feb 07 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> wrote:
> 
>>> On 7 Feb 2017, at 05:42, Karl Wagner <razielim at gmail.com> wrote:
>>> 
>>>> 
>>>> On 6 Feb 2017, at 19:29, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org
>> <mailto:swift-evolution at swift.org>> wrote:
>>>> 
>>> When it comes to fast access what’s most important is cache
>>> locality. DRAM is like 200x slower than L2 cache. Looping through
>>> some contiguous 16-bit integers is always going to beat the pants
>>> out of derefencing pointers.
>> 
>>> 
>> Hi Karl
>> That is of course hardware/processor dependent…and Swift runs on different target systems… isn’t? 
> 
> Actually the basic calculus holds for any modern processor.
> 
>>> It’s quite rare that you need to grab arbitrary parts of a String
>>> without knowing what is inside it. If you’re saying str[12..<34] -
>>> why 12, and why 34? Is 12 the length of some substring you know from
>>> earlier? In that case, you could find out how many CodeUnits it had,
>>> and use that information instead.
>>> For this example, I have used constants here, but normally these would be variables..
>>> 
>> 
>> I’d say it is not so rare, these things are often used for all kinds of string parsing, there are
>> many
>> examples to be found on the Internet.
>> TedvG
> 
> That proves nothing, though.  The fact that people are using integers to
> do this doesn't mean you need to use them, nor does it mean that you'll
> get the right results from doing so.  Typically examples that use
> integer constants with strings are wrong for some large proportion of
> unicode text.
> 
  This is all a bit confusing.  
in https://en.wiktionary.org/wiki/glyph
   Definition of a glyph in our context: 
(typography, computing) A visual representation of a letter <https://en.wiktionary.org/wiki/letter>, character <https://en.wiktionary.org/wiki/character>, or symbol <https://en.wiktionary.org/wiki/symbol>, in a specific font <https://en.wiktionary.org/wiki/font> and style <https://en.wiktionary.org/wiki/style>.

I now assume that:
      1. -= a “plain” Unicode character (codepoint?)  can result in one glyph.=-
      2. -= a  grapheme cluster always results in just a single glyph, true? =- 
      3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )
     4.  In this context, a glyph is a humanly recognisable visual form of a character,
     5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
         with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
         of encoding the glyph was based upon.

 is this correct? (especially 1 and 2) 

Based on these assumptions, to me then, the definition of a character == glyph.
Therefore, my working model: I see a row of characters as a row of glyphs,
which are discrete autonomous visual elements, ergo: 
Each element is individually addressable with integers (ordinal)

?

TedvG



           

> -- 
> -Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170207/7651c442/attachment.html>


More information about the swift-evolution mailing list