[swift-evolution] Trial balloon: Ensure that String always contains valid Unicode

Mon Jan 4 17:08:40 CST 2016

>> But doing lazy checking of strings would end up having to check every string that comes from ObjC

I don’t think that’s necessarily true. There’s a limited set of places where invalid Unicode can creep into an NSString, and so the lazy check could probably bypass quite a few common cases — an ASCII string for example. Without digging into it, I suspect any NSString created from UTF-8 data can be safely bridged, since unpaired surrogate chars can’t make it through UTF-8.

Cheers, P

> On Jan 4, 2016, at 4:43 PM, Kevin Ballard via swift-evolution <swift-evolution at swift.org> wrote:
> 
> That kind of lazy checking of arrays is used pretty rarely (since, as you say, it only occurs with an `as!` expression). But doing lazy checking of strings would end up having to check every string that comes from ObjC (which, in a Swift app that uses Cocoa frameworks, is likely to be most strings the app works with).
>  
> -Kevin Ballard
>  
> On Mon, Jan 4, 2016, at 02:41 PM, Félix Cloutier wrote:
>> There are precedents for lazily checking for validity after bridging. Using `array as! [T]` on a NSArray without generics  fails lazily if you access an object that's not a T.
>>  
>> Félix
>>  
>>> Le 4 janv. 2016 à 14:59:47, Dmitri Gribenko via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> a écrit :
>>>  
>>> On Mon, Jan 4, 2016 at 9:37 PM, Kevin Ballard via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> 
>>> I agree in principle that it would be great if String could enforce that it's always valid.
>>>  
>>> But unfortunately, in practice, there's no way to do that without making it expensive to bridge from Obj-C. Because, as you've demonstrated, you can create NSStrings that contain things that aren't actually valid unicode sequences, every single bridge from an NSString to a String would have to be checked for validity. Not only that, but it's not clear what the behavior would be if an invalid string is found, since these bridges are unconditional - would Swift panic? Would it silently replace the invalid sequence with U+FFFD? Or something else entirely? But the question doesn't really matter, because turning these bridges from O(1) into O(N) would be an unacceptable performance penalty anyway.
>>>  
>>> Currently String replaces invalid sequences with U+FFFD lazily during access, but there are corner cases related to Objective-C bridging that can still leak invalid Unicode.
>>>  
>>> Dmitri
>>>  
>>> -- 
>>> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
>>> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com <mailto:gribozavr at gmail.com>>*/
>>>  _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>  
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160104/5a333f75/attachment.html>