[swift-users] Splitting a string into "natural/visual character" components?

Jens Persson jens at bitcycle.com
Fri May 12 05:13:05 CDT 2017


Ah, thanks!

On Fri, May 12, 2017 at 11:45 AM, Martin R <martinr448 at gmail.com> wrote:

> The enumerateSubstrings method of (NS)String has a
> .byComposedCharacterSequences option which causes Emoji sequences like
> "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ" to be treated as a single unit:
>
>     func f(_ s: String) -> [String] {
>         var a: [String] = []
>         s.enumerateSubstrings(in: s.startIndex..<s.endIndex, options: .
> byComposedCharacterSequences) {
>             (c, _, _, _) in
>             if let c = c { a.append(c) }
>         }
>         return a
>     }
>
>     print(f("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ")) // ["๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ", "๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ"]
>
>
> As I understand it from https://oleb.net/blog/2016/12/emoji-4-0/, Emoji
> sequences are considered as a single grapheme cluster in Unicode 9, which
> means that you can simply do something like
>
>     Array("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ".characters)
>
> once Unicode 9 is adopted in Swift.
>
> Regards, Martin
>
>
> On 12. May 2017, at 10:43, Jens Persson via swift-users <
> swift-users at swift.org> wrote:
>
> I want a function f such that:
>
> f("abc") == ["a", "b", "c"]
>
> f("cafรฉ") == ["c", "a", "f", "รฉ"]
>
> f("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ") == ["๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ", "๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ"]
>
> I'm not sure if the last example renders correctly by mail for everyone
> but the input String contains these _two_ "natural/visual characters":
> (1) A family emoji
> (2) a construction worker (woman, with skin tone modifier) emoji.
> and the result is an Array of two strings (one for each emoji).
>
> The first two examples are easy, the third example is the tricky one.
>
> Is there a (practical) way to do this (in Swift 3)?
>
> /Jens
>
>
>
> PS
>
> It's OK if the function has to depend on eg a graphics context etc.
> (I tried writing a function so that it extracts the glyphs, using
> NSTextStorage, NSLayoutManager and the AppleColorEmoji font, but it says
> that "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ" contains 18(!) glyphs, whereas eg "cafรฉ" contains
> 4 as expected.)
>
> If the emojis of the third example doesn't look like they should in this
> mail, here is another way to write the exact same example using only simple
> text:
>
> let inputOfThirdExample = "\u{1F468}\u{200D}\u{1F469}\u{
> 200D}\u{1F467}\u{200D}\u{1F466}\u{1F477}\u{1F3FE}\u{200D}\u{2640}\u{FE0F}"
>
> let result = f(inputOfThirdExample)
>
> let expectedResult = ["\u{1F468}\u{200D}\u{1F469}\
> u{200D}\u{1F467}\u{200D}\u{1F466}", "\u{1F477}\u{1F3FE}\u{200D}\u{
> 2640}\u{FE0F}"]
>
> print(result.elementsEqual(result)) // Should print true
>
>
> _______________________________________________
> swift-users mailing list
> swift-users at swift.org
> https://lists.swift.org/mailman/listinfo/swift-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20170512/753174a8/attachment.html>


More information about the swift-users mailing list