[swift-evolution] Strings in Swift 4
Ted F.A. van Gaalen
tedvgiosdev at gmail.com
Tue Feb 7 14:19:25 CST 2017
> On 7 Feb 2017, at 19:44, Dave Abrahams <dabrahams at apple.com> wrote:
> on Tue Feb 07 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> wrote:
>>> On 7 Feb 2017, at 05:42, Karl Wagner <razielim at gmail.com> wrote:
>>>> On 6 Feb 2017, at 19:29, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org
>> <mailto:swift-evolution at swift.org>> wrote:
>>> When it comes to fast access what’s most important is cache
>>> locality. DRAM is like 200x slower than L2 cache. Looping through
>>> some contiguous 16-bit integers is always going to beat the pants
>>> out of derefencing pointers.
>> Hi Karl
>> That is of course hardware/processor dependent…and Swift runs on different target systems… isn’t?
> Actually the basic calculus holds for any modern processor.
>>> It’s quite rare that you need to grab arbitrary parts of a String
>>> without knowing what is inside it. If you’re saying str[12..<34] -
>>> why 12, and why 34? Is 12 the length of some substring you know from
>>> earlier? In that case, you could find out how many CodeUnits it had,
>>> and use that information instead.
>>> For this example, I have used constants here, but normally these would be variables..
>> I’d say it is not so rare, these things are often used for all kinds of string parsing, there are
>> examples to be found on the Internet.
> That proves nothing, though. The fact that people are using integers to
> do this doesn't mean you need to use them, nor does it mean that you'll
> get the right results from doing so. Typically examples that use
> integer constants with strings are wrong for some large proportion of
> unicode text.
This is all a bit confusing.
Definition of a glyph in our context:
(typography, computing) A visual representation of a letter <https://en.wiktionary.org/wiki/letter>, character <https://en.wiktionary.org/wiki/character>, or symbol <https://en.wiktionary.org/wiki/symbol>, in a specific font <https://en.wiktionary.org/wiki/font> and style <https://en.wiktionary.org/wiki/style>.
I now assume that:
1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-
2. -= a grapheme cluster always results in just a single glyph, true? =-
3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )
4. In this context, a glyph is a humanly recognisable visual form of a character,
5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
of encoding the glyph was based upon.
is this correct? (especially 1 and 2)
Based on these assumptions, to me then, the definition of a character == glyph.
Therefore, my working model: I see a row of characters as a row of glyphs,
which are discrete autonomous visual elements, ergo:
Each element is individually addressable with integers (ordinal)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the swift-evolution