[swift-evolution] Pitch: Renaming CharacterSet to UnicodeScalarSet
Jay Abbott
jay at abbott.me.uk
Wed Sep 28 23:59:06 CDT 2016
Yes - this is totally confusing. CharacterSet and Set<Character> are
completely different things with different semantics.
I don't know the history, but is CharacterSet simply to have a Swift
equivalent of NSCharacterSet? That seems to be what it is, but since Swift
redefined characters in a better way, this should be removed or called
something else to avoid confusion. You shouldn't have to qualify what you
mean by 'character' in a type name because it diverges from the definition
in the rest of the language.
On Thu, 29 Sep 2016 at 04:48 Xiaodi Wu via swift-evolution <
swift-evolution at swift.org> wrote:
> On Wed, Sep 28, 2016 at 10:34 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>
>> On Wed, Sep 28, 2016 at 10:23 PM, Charles Srstka via swift-evolution <
>> swift-evolution at swift.org> wrote:
>>
>>> On Sep 28, 2016, at 9:57 PM, Erica Sadun via swift-evolution <
>>> swift-evolution at swift.org> wrote:
>>>
>>>
>>> D'erp. I missed that. And that's an unambiguous answer.
>>>
>>> So let me move on to part B of the pitch: I think CharacterSets are
>>> broken.
>>>
>>> Xiaodi Wu: "isn't the problem you're presenting really an argument that
>>> the type should be fleshed out to handle characters (grapheme clusters)
>>> containing more than one Unicode scalar?"
>>>
>>>
>>> It seems that it already does handle such characters:
>>>
>>> (done in Objective-C so we can log the length of the range as a count of
>>> UTF-16 code units)
>>>
>>> #import <Foundation/Foundation.h>
>>>
>>> int main(int argc, char *argv[]) {
>>> @autoreleasepool {
>>> NSCharacterSet *bikeSet = [NSCharacterSet
>>> characterSetWithCharactersInString:@"🚲"];
>>> NSString *str = @"foo🚲bar";
>>>
>>>
>>> NSRange range = [str rangeOfCharacterFromSet:bikeSet];
>>>
>>>
>>> NSLog(@"location: %lu length: %lu", range.location, range.length
>>> );
>>> }
>>> }
>>>
>>> - - - - - - -
>>>
>>> *2016-09-28 22:20:00.622471 test[15577:2433912] location: 3 length: 2*
>>> *Program ended with exit code: 0*
>>>
>>> - - - - - - -
>>>
>>> As we can see, the character from the set is recognized as consisting of
>>> two code units. There are a few bugs in the system, though. See the
>>> cocoa-dev thread “Where is my bicycle?” from about a year ago:
>>> http://prod.lists.apple.com/archives/cocoa-dev/2015/Apr/msg00074.html
>>>
>>
>> The bike emoji might be two code units, but it is one Unicode scalar
>> (U+1F6B2). However, the Canadian flag emoji, for instance, is two Unicode
>> scalars (U+1F1E8 U+1F1E6) but nonetheless one character.
>>
>
> To illustrate in code how CharacterSet doesn't actually handle characters
> made up of multiple Unicode scalars:
>
> ```
> import Foundation
>
> let str1 = "🇦🇩"
> let first = CharacterSet(charactersIn: str1) // this actually crashes
> corelibs-foundation
> let str2 = "🇦🇺"
> let second = CharacterSet(charactersIn: str2)
> let intersection = first.intersection(second)
> print(intersection.isEmpty)
> // actual output: false
> // obviously, if we were really dealing with characters, the intersection
> should be empty
> ```
>
>
>> Charles
>>>
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160929/6708deb8/attachment.html>
More information about the swift-evolution
mailing list