[swift-evolution] Pitch: Renaming CharacterSet to UnicodeScalarSet

David Sweeris davesweeris at mac.com
Thu Sep 29 07:49:09 CDT 2016


IIUC, Jay wasn't arguing for renaming CharacterSet, but replacing it with Swift's existing Set mechanism. If/when generics get to the point that we can say 'extension Set<Character> {...}', I think the transition could simply be putting 'typealias CharacterSet = Set<Character>' somewhere in the framework (although I don't know how Obj-C interop would be affected by such a change).

- Dave Sweeris

> On Sep 29, 2016, at 00:30, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
> 
> CharacterSet is a Foundation value type. It was a subject of the following proposal:
> 
> https://github.com/apple/swift-evolution/blob/master/proposals/0069-swift-mutability-for-foundation.md
> 
> We might be able improve on the implementation, but I don't think re-arguing the name is an option.
> 
> 
>> On Wed, Sep 28, 2016 at 11:59 PM Jay Abbott <jay at abbott.me.uk> wrote:
>> 
>> Yes - this is totally confusing. CharacterSet and Set<Character> are completely different things with different semantics.
>> 
>> I don't know the history, but is CharacterSet simply to have a Swift equivalent of NSCharacterSet? That seems to be what it is, but since Swift redefined characters in a better way, this should be removed or called something else to avoid confusion. You shouldn't have to qualify what you mean by 'character' in a type name because it diverges from the definition in the rest of the language.
>> 
>>> On Thu, 29 Sep 2016 at 04:48 Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
>>>> On Wed, Sep 28, 2016 at 10:34 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>> 
>>>> On Wed, Sep 28, 2016 at 10:23 PM, Charles Srstka via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>> On Sep 28, 2016, at 9:57 PM, Erica Sadun via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>> 
>>>>>> D'erp. I missed that. And that's an unambiguous answer.
>>>>>> 
>>>>>> So let me move on to part B of the pitch: I think CharacterSets are broken.
>>>>>> 
>>>>>>> Xiaodi Wu: "isn't the problem you're presenting really an argument that the type should be fleshed out to handle characters (grapheme clusters) containing more than one Unicode scalar?"
>>>>> 
>>>>> It seems that it already does handle such characters:
>>>>> 
>>>>> (done in Objective-C so we can log the length of the range as a count of UTF-16 code units)
>>>>> 
>>>>> #import <Foundation/Foundation.h>
>>>>> 
>>>>> int main(int argc, char *argv[]) {
>>>>>     @autoreleasepool {
>>>>>         NSCharacterSet *bikeSet = [NSCharacterSet characterSetWithCharactersInString:@"🚲"];
>>>>>         NSString *str = @"foo🚲bar";
>>>>>         
>>>>>         NSRange range = [str rangeOfCharacterFromSet:bikeSet];
>>>>>         
>>>>>         NSLog(@"location: %lu length: %lu", range.location, range.length);
>>>>>     }
>>>>> }
>>>>> 
>>>>> - - - - - - -
>>>>> 
>>>>> 2016-09-28 22:20:00.622471 test[15577:2433912] location: 3 length: 2
>>>>> Program ended with exit code: 0
>>>>> 
>>>>> - - - - - - -
>>>>> 
>>>>> As we can see, the character from the set is recognized as consisting of two code units. There are a few bugs in the system, though. See the cocoa-dev thread “Where is my bicycle?” from about a year ago: http://prod.lists.apple.com/archives/cocoa-dev/2015/Apr/msg00074.html
>>>> 
>>>> The bike emoji might be two code units, but it is one Unicode scalar (U+1F6B2). However, the Canadian flag emoji, for instance, is two Unicode scalars (U+1F1E8 U+1F1E6) but nonetheless one character.
>>> 
>>> To illustrate in code how CharacterSet doesn't actually handle characters made up of multiple Unicode scalars:
>>> 
>>> ```
>>> import Foundation
>>> 
>>> let str1 = "🇦🇩"
>>> let first = CharacterSet(charactersIn: str1) // this actually crashes corelibs-foundation
>>> let str2 = "🇦🇺"
>>> let second = CharacterSet(charactersIn: str2)
>>> let intersection = first.intersection(second)
>>> print(intersection.isEmpty)
>>> // actual output: false
>>> // obviously, if we were really dealing with characters, the intersection should be empty
>>> ```
>>> 
>>>> 
>>>>> Charles
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> 
>>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160929/e6f77515/attachment.html>


More information about the swift-evolution mailing list