[swift-evolution] Trial balloon: Ensure that String always contains valid Unicode

Paul Cantrell cantrell at pobox.com
Mon Jan 4 17:22:07 CST 2016

> On Jan 4, 2016, at 5:11 PM, Kevin Ballard <kevin at sb.org> wrote:
> On Mon, Jan 4, 2016, at 03:08 PM, Paul Cantrell wrote:
>>>> But doing lazy checking of strings would end up having to check every string that comes from ObjC
>> I don’t think that’s necessarily true. There’s a limited set of places where invalid Unicode can creep into an NSString, and so the lazy check could probably bypass quite a few common cases — an ASCII string for example. Without digging into it, I suspect any NSString created from UTF-8 data can be safely bridged, since unpaired surrogate chars can’t make it through UTF-8.
> Every single method you implement that takes a `String` property and is either exposed to Obj-C or is overriding an Obj-C declaration will have to check the String parameter every single time the function is called.
> Every time you call an Obj-C method that returns a String, you'll have to check that String result.

Not necessarily. While it’s true that an NSString is represented as UTF-16 internally (right?), there’s a limited set of operations that can introduce invalid Unicode. In theory, at least, an NSString could keep a flag that tracks whether it could potentially contain be invalid.

This is much better than the doomsday scenario you lay out in two respects:

(1) That flag would start out false in many common situations (including NSStrings decoded from UTF-8, Latin-1, and ASCII), and could stay false with O(1) effort for substring operations. My guess is that this covers the vast majority of strings floating around in a typical app.

(2) Once a string is verified, the flag can be flipped true. No need to keep revalidating. Yes, there are threading concerns with that, but I trust the team that made the dark magic of Swift’s weak work may have some bright ideas on this.

The bottom line is that not every NSString → String bridge need to be O(n). At least in theory. Someone with more intimate knowledge of NSString can correct me if I’m wrong.

>  Basically, any time a String object is backed by an NSString, which is going to be very common in most apps, that backing NSString will have to be checked.

Keep in mind that we’re already incurring that O(n) expense right now for every Swift operation that turns an NSString-backed string into characters — that plus the API burden of having that check deferred, which is what originally motivated this thread.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160104/11f59afc/attachment.html>

More information about the swift-evolution mailing list