[swift-users] Splitting a string into "natural/visual character" components?
Martin R
martinr448 at gmail.com
Fri May 12 04:45:30 CDT 2017
The enumerateSubstrings method of (NS)String has a .byComposedCharacterSequences option which causes Emoji sequences like "๐จโ๐ฉโ๐งโ๐ฆ" to be treated as a single unit:
func f(_ s: String) -> [String] {
var a: [String] = []
s.enumerateSubstrings(in: s.startIndex..<s.endIndex, options: .byComposedCharacterSequences) {
(c, _, _, _) in
if let c = c { a.append(c) }
}
return a
}
print(f("๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ")) // ["๐จโ๐ฉโ๐งโ๐ฆ", "๐ท๐พโโ๏ธ"]
As I understand it from https://oleb.net/blog/2016/12/emoji-4-0/ <https://oleb.net/blog/2016/12/emoji-4-0/>, Emoji sequences are considered as a single grapheme cluster in Unicode 9, which means that you can simply do something like
Array("๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ".characters)
once Unicode 9 is adopted in Swift.
Regards, Martin
> On 12. May 2017, at 10:43, Jens Persson via swift-users <swift-users at swift.org> wrote:
>
> I want a function f such that:
>
> f("abc") == ["a", "b", "c"]
>
> f("cafรฉ") == ["c", "a", "f", "รฉ"]
>
> f("๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ") == ["๐จโ๐ฉโ๐งโ๐ฆ", "๐ท๐พโโ๏ธ"]
>
> I'm not sure if the last example renders correctly by mail for everyone but the input String contains these _two_ "natural/visual characters":
> (1) A family emoji
> (2) a construction worker (woman, with skin tone modifier) emoji.
> and the result is an Array of two strings (one for each emoji).
>
> The first two examples are easy, the third example is the tricky one.
>
> Is there a (practical) way to do this (in Swift 3)?
>
> /Jens
>
>
>
> PS
>
> It's OK if the function has to depend on eg a graphics context etc.
> (I tried writing a function so that it extracts the glyphs, using NSTextStorage, NSLayoutManager and the AppleColorEmoji font, but it says that "๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ" contains 18(!) glyphs, whereas eg "cafรฉ" contains 4 as expected.)
>
> If the emojis of the third example doesn't look like they should in this mail, here is another way to write the exact same example using only simple text:
>
> let inputOfThirdExample = "\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}\u{1F477}\u{1F3FE}\u{200D}\u{2640}\u{FE0F}"
>
> let result = f(inputOfThirdExample)
>
> let expectedResult = ["\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}", "\u{1F477}\u{1F3FE}\u{200D}\u{2640}\u{FE0F}"]
>
> print(result.elementsEqual(result)) // Should print true
>
>
> _______________________________________________
> swift-users mailing list
> swift-users at swift.org
> https://lists.swift.org/mailman/listinfo/swift-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20170512/bad44648/attachment.html>
More information about the swift-users
mailing list