[swift-users] Splitting a string into "natural/visual character" components?
Jens Persson
jens at bitcycle.com
Fri May 12 05:13:05 CDT 2017
Ah, thanks!
On Fri, May 12, 2017 at 11:45 AM, Martin R <martinr448 at gmail.com> wrote:
> The enumerateSubstrings method of (NS)String has a
> .byComposedCharacterSequences option which causes Emoji sequences like
> "๐จโ๐ฉโ๐งโ๐ฆ" to be treated as a single unit:
>
> func f(_ s: String) -> [String] {
> var a: [String] = []
> s.enumerateSubstrings(in: s.startIndex..<s.endIndex, options: .
> byComposedCharacterSequences) {
> (c, _, _, _) in
> if let c = c { a.append(c) }
> }
> return a
> }
>
> print(f("๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ")) // ["๐จโ๐ฉโ๐งโ๐ฆ", "๐ท๐พโโ๏ธ"]
>
>
> As I understand it from https://oleb.net/blog/2016/12/emoji-4-0/, Emoji
> sequences are considered as a single grapheme cluster in Unicode 9, which
> means that you can simply do something like
>
> Array("๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ".characters)
>
> once Unicode 9 is adopted in Swift.
>
> Regards, Martin
>
>
> On 12. May 2017, at 10:43, Jens Persson via swift-users <
> swift-users at swift.org> wrote:
>
> I want a function f such that:
>
> f("abc") == ["a", "b", "c"]
>
> f("cafรฉ") == ["c", "a", "f", "รฉ"]
>
> f("๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ") == ["๐จโ๐ฉโ๐งโ๐ฆ", "๐ท๐พโโ๏ธ"]
>
> I'm not sure if the last example renders correctly by mail for everyone
> but the input String contains these _two_ "natural/visual characters":
> (1) A family emoji
> (2) a construction worker (woman, with skin tone modifier) emoji.
> and the result is an Array of two strings (one for each emoji).
>
> The first two examples are easy, the third example is the tricky one.
>
> Is there a (practical) way to do this (in Swift 3)?
>
> /Jens
>
>
>
> PS
>
> It's OK if the function has to depend on eg a graphics context etc.
> (I tried writing a function so that it extracts the glyphs, using
> NSTextStorage, NSLayoutManager and the AppleColorEmoji font, but it says
> that "๐จโ๐ฉโ๐งโ๐ฆ๐ท๐พโโ๏ธ" contains 18(!) glyphs, whereas eg "cafรฉ" contains
> 4 as expected.)
>
> If the emojis of the third example doesn't look like they should in this
> mail, here is another way to write the exact same example using only simple
> text:
>
> let inputOfThirdExample = "\u{1F468}\u{200D}\u{1F469}\u{
> 200D}\u{1F467}\u{200D}\u{1F466}\u{1F477}\u{1F3FE}\u{200D}\u{2640}\u{FE0F}"
>
> let result = f(inputOfThirdExample)
>
> let expectedResult = ["\u{1F468}\u{200D}\u{1F469}\
> u{200D}\u{1F467}\u{200D}\u{1F466}", "\u{1F477}\u{1F3FE}\u{200D}\u{
> 2640}\u{FE0F}"]
>
> print(result.elementsEqual(result)) // Should print true
>
>
> _______________________________________________
> swift-users mailing list
> swift-users at swift.org
> https://lists.swift.org/mailman/listinfo/swift-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20170512/753174a8/attachment.html>
More information about the swift-users
mailing list