[swift-users] Splitting a string into "natural/visual character" components?

Martin R martinr448 at gmail.com
Fri May 12 04:45:30 CDT 2017


The enumerateSubstrings method of (NS)String has a .byComposedCharacterSequences option which causes Emoji sequences like "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ" to be treated as a single unit:

    func f(_ s: String) -> [String] {
        var a: [String] = []
        s.enumerateSubstrings(in: s.startIndex..<s.endIndex, options: .byComposedCharacterSequences) {
            (c, _, _, _) in
            if let c = c { a.append(c) }
        }
        return a
    }

    print(f("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ")) // ["๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ", "๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ"]


As I understand it from https://oleb.net/blog/2016/12/emoji-4-0/ <https://oleb.net/blog/2016/12/emoji-4-0/>, Emoji sequences are considered as a single grapheme cluster in Unicode 9, which means that you can simply do something like

    Array("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ".characters)

once Unicode 9 is adopted in Swift.

Regards, Martin


> On 12. May 2017, at 10:43, Jens Persson via swift-users <swift-users at swift.org> wrote:
> 
> I want a function f such that:
> 
> f("abc") == ["a", "b", "c"]
> 
> f("cafรฉ") == ["c", "a", "f", "รฉ"]
> 
> f("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ") == ["๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ", "๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ"]
> 
> I'm not sure if the last example renders correctly by mail for everyone but the input String contains these _two_ "natural/visual characters":
> (1) A family emoji
> (2) a construction worker (woman, with skin tone modifier) emoji.
> and the result is an Array of two strings (one for each emoji).
> 
> The first two examples are easy, the third example is the tricky one.
> 
> Is there a (practical) way to do this (in Swift 3)?
> 
> /Jens
> 
> 
> 
> PS
> 
> It's OK if the function has to depend on eg a graphics context etc.
> (I tried writing a function so that it extracts the glyphs, using NSTextStorage, NSLayoutManager and the AppleColorEmoji font, but it says that "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ" contains 18(!) glyphs, whereas eg "cafรฉ" contains 4 as expected.)
> 
> If the emojis of the third example doesn't look like they should in this mail, here is another way to write the exact same example using only simple text:
> 
> let inputOfThirdExample = "\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}\u{1F477}\u{1F3FE}\u{200D}\u{2640}\u{FE0F}"
> 
> let result = f(inputOfThirdExample)
> 
> let expectedResult = ["\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}", "\u{1F477}\u{1F3FE}\u{200D}\u{2640}\u{FE0F}"]
> 
> print(result.elementsEqual(result)) // Should print true
> 
> 
> _______________________________________________
> swift-users mailing list
> swift-users at swift.org
> https://lists.swift.org/mailman/listinfo/swift-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20170512/bad44648/attachment.html>


More information about the swift-users mailing list