[swift-evolution] Strings in Swift 4

Brent Royal-Gordon brent at architechies.com
Sat Jan 21 05:49:22 CST 2017


> On Jan 19, 2017, at 6:56 PM, Ben Cohen via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Below is our take on a design manifesto for Strings in Swift 4 and beyond.
> 
> Probably best read in rendered markdown on GitHub:
> https://github.com/apple/swift/blob/master/docs/StringManifesto.md
> 
> We’re eager to hear everyone’s thoughts.

There is so, so much good stuff here. I'm really looking forward to seeing how these ideas develop and enter the language.

> #### Future Directions
> 
> One of the most common internationalization errors is the unintentional
> presentation to users of text that has not been localized, but regularizing APIs
> and improving documentation can go only so far in preventing this error.
> Combined with the fact that `String` operations are non-localized by default,
> the environment for processing human-readable text may still be somewhat
> error-prone in Swift 4.
> 
> For an audience of mostly non-experts, it is especially important that naïve
> code is very likely to be correct if it compiles, and that more sophisticated
> issues can be revealed progressively.  For this reason, we intend to
> specifically and separately target localization and internationalization
> problems in the Swift 5 timeframe.

I am very glad to see this statement in a Swift design document. I have a few ideas about this, but they can wait until the next version.

> At first blush this just adds work, but consider what it does
> for equality: two strings that normalize the same, naturally, will collate the
> same.  But also, *strings that normalize differently will always collate
> differently*.  In other words, for equality, it is sufficient to compare the
> strings' normalized forms and see if they are the same.  We can therefore
> entirely skip the expensive part of collation for equality comparison.
> 
> Next, naturally, anything that applies to equality also applies to hashing: it
> is sufficient to hash the string's normalized form, bypassing collation keys.

That's a great catch.

> This leaves us executing the full UCA *only* for localized sorting, and ICU's
> implementation has apparently been very well optimized.

Sounds good to me.

> Because the current `Comparable` protocol expresses all comparisons with binary
> operators, string comparisons—which may require
> additional [options](#operations-with-options)—do not fit smoothly into the
> existing syntax.  At the same time, we'd like to solve other problems with
> comparison, as outlined
> in
> [this proposal](https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e)
> (implemented by changes at the head
> of
> [this branch](https://github.com/CodaFi/swift/commits/space-the-final-frontier)).
> We should adopt a modification of that proposal that uses a method rather than
> an operator `<=>`:
> 
> ```swift
> enum SortOrder { case before, same, after }
> 
> protocol Comparable : Equatable {
> func compared(to: Self) -> SortOrder
> ...
> }
> ```
> 
> This change will give us a syntactic platform on which to implement methods with
> additional, defaulted arguments, thereby unifying and regularizing comparison
> across the library.
> 
> ```swift
> extension String {
> func compared(to: Self) -> SortOrder
> 
> }
> ```

While it's great that `compared(to:case:etc.)` is parallel to `compared(to:)`, you don't actually want to *use* anything like `compared(to:)` if you can help it. Think about the clarity at the use site:

	if foo.compared(to: bar, case: .insensitive, locale: .current) == .before { … }

The operands and sense of the comparison are kind of lost in all this garbage. You really want to see `foo < bar` in this code somewhere, but you don't.

I'm struggling a little with the naming and syntax, but as a general approach, I think we want people to use something more like this:

	if StringOptions(case: .insensitive, locale: .current).compare(foo < bar) { … }

Which might have an implementation like:

	// This protocol might actually be part of your `Unicode` protocol; I'm just breaking it out separately here.
	protocol StringOptionsComparable {
		func compare(to: Self, options: StringOptions) -> SortOrder
	}
	extension StringOptionsComparable {
		static func < (lhs: Self, rhs: Self) -> (lhs: Self, rhs: Self, op: (SortOrder) -> Bool) {
			return (lhs, rhs, { $0 == .before })
		}
		static func == (lhs: Self, rhs: Self) -> (lhs: Self, rhs: Self, op: (SortOrder) -> Bool) {
			return (lhs, rhs, { $0 == .same })
		}
		static func > (lhs: Self, rhs: Self) -> (lhs: Self, rhs: Self, op: (SortOrder) -> Bool) {
			return (lhs, rhs, { $0 == .after })
		}
		// etc.
	}
	
	struct StringOptions {
		// Obvious properties and initializers go here
		
		func compare<StringType: StringOptionsComparable>(_ expression: (lhs: StringType, rhs: StringType, op: (SortOrder) -> Bool)) -> Bool {
			return expression.op( expression.lhs.compare(to: expression.rhs, options: self) )
		}
	}

You could also imagine much less verbose syntaxes using custom operators. Strawman example:

	if foo < bar %% (case: .insensitive, locale: .current) { … }

I think this would make human-friendly comparisons much easier to write and understand than adding a bunch of options to a `compared(to:)` call.

> This quirk aside, every aspect of strings-as-collections-of-graphemes appears to
> comport perfectly with Unicode. We think the concatenation problem is tolerable,
> because the cases where it occurs all represent partially-formed constructs. 
> ...
> Admitting these cases encourages exploration of grapheme composition and is
> consistent with what appears to be an overall Unicode philosophy that “no
> special provisions are made to get marginally better behavior for… cases that
> never occur in practice.”[2]

This sounds good to me.

> ### Unification of Slicing Operations

I think you know what I think about this. :^)

(By the way, I've at least partially let this proposal drop for the moment because it's so dependent on generic subscripts to really be an improvement. I do plan to pick it up when those arrive; ping me then if I don't notice.)

A question, though. We currently have a couple of methods, mostly with `subrange` in their names, that can be thought of as slicing operations but aren't:

	collection.removeSubrange(i..<j)
	collection[i..<j].removeAll()
	
	collection.replaceSubrange(i..<j, with: others)
	collection[i..<j].replaceAll(with: others)		// hypothetically

Should these be changed, too? Can we make them efficient (in terms of e.g. copy-on-write) if we do?

> ### Substrings
> 
> When implementing substring slicing, languages are faced with three options:
> 
> 1. Make the substrings the same type as string, and share storage.
> 2. Make the substrings the same type as string, and copy storage when making the substring.
> 3. Make substrings a different type, with a storage copy on conversion to string.
> 
> We think number 3 is the best choice.

I agree, and I think `Substring` is the right name for it: parallel to `SubSequence`, explains where it comes from, captures the trade-offs nicely. `StringSlice` is parallel to `ArraySlice`, but it strikes me as a "foolish consistency", as the saying goes; it avoids a term of art for little reason I can see.

However, is there a reason we're talking about using a separate `Substring` type at all, instead of using `Slice<String>`? Perhaps I'm missing something, but I *think* it does everything we need here. (Of course, you could say the same thing about `ArraySlice`, and yet we have that, too.)

> The downside of having two types is the inconvenience of sometimes having a
> `Substring` when you need a `String`, and vice-versa. It is likely this would
> be a significantly bigger problem than with `Array` and `ArraySlice`, as
> slicing of `String` is such a common operation. It is especially relevant to
> existing code that assumes `String` is the currency type. To ease the pain of
> type mismatches, `Substring` should be a subtype of `String` in the same way
> that `Int` is a subtype of `Optional<Int>`.

I've seen people struggle with the `Array`/`ArraySlice` issue when writing recursive algorithms, so personally, I'd like to see a more general solution that handles all `Collection`s.

Rather than having an implicit copying conversion from `String` to `Substring` (or `Array` to `ArraySlice`, or `Collection` to `Collection.SubSequence`), I wonder if implicitly converting in the other direction might be more useful, at least in some circumstances. Converting in this direction does *not* involve an implicit copy, merely calculating a range, so you won't have the same performance surprises. On the other hand, it's also useful in fewer situations.

(If we did go with consistently using `Slice<T>`, this might merely be a special-cased `T -> Slice<T>` conversion. One type, special-cased until we feel comfortable inventing a general mechanism.)

> A user who needs to optimize away copies altogether should use this guideline:
> if for performance reasons you are tempted to add a `Range` argument to your
> method as well as a `String` to avoid unnecessary copies, you should instead
> use `Substring`.

I do like this as a guideline, though. There's definitely room in the standard library for "a string and a range of that string to operate upon".

> ##### The “Empty Subscript”
> 
> To make it easy to call such an optimized API when you only have a `String` (or
> to call any API that takes a `Collection`'s `SubSequence` when all you have is
> the `Collection`), we propose the following “empty subscript” operation,
> 
> ```swift
> extension Collection {
>  subscript() -> SubSequence { 
>    return self[startIndex..<endIndex] 
>  }
> }
> ```
> 
> which allows the following usage:
> 
> ```swift
> funcThatIsJustLooking(at: person.name[]) // pass person.name as Substring
> ```

That's a little bit funky, but I guess it might work.

> Therefore, APIs that operate on an `NSString`/`NSRange` pair should be imported
> without the `NSRange` argument.  The Objective-C importer should be changed to
> give these APIs special treatment so that when a `Substring` is passed, instead
> of being converted to a `String`, the full `NSString` and range are passed to
> the Objective-C method, thereby avoiding a copy.
> 
> As a result, you would never need to pass an `NSRange` to these APIs, which
> solves the impedance problem by eliminating the argument, resulting in more
> idiomatic Swift code while retaining the performance benefit.  To help users
> manually handle any cases that remain, Foundation should be augmented to allow
> the following syntax for converting to and from `NSRange`:
> 
> ```swift
> let nsr = NSRange(i..<j, in: s) // An NSRange corresponding to s[i..<j]
> let iToJ = Range(nsr, in: s)    // Equivalent to i..<j
> ```

I sort of like this, but note that if we use `String` -> `Substring` conversion instead of the other way around, there's less magic needed to get this effect: `NSString, NSRange` can be imported as `Substring`, which automatically converts from `String` in exactly the manner we want it to.

> Since Unicode conformance is a key feature of string processing in swift, we
> call that protocol `Unicode`:

I'm sorry, I think the name is too clever by half. It sounds something like what `UnicodeCodec` actually is. Or maybe a type representing a version of the Unicode standard or something. I'd prefer something more prosaic like `StringProtocol`.

> **Note:** `Unicode` would make a fantastic namespace for much of
> what's in this proposal if we could get the ability to nest types and
> protocols in protocols.

I mean, sure, but then you imagine it being used generically:

	func parse<UnicodeType: Unicode>(_ source: UnicodeType) -> UnicodeType
	// which concrete types can `source` be???

> We should provide convenient APIs processing strings by character.  For example,
> it should be easy to cleanly express, “if this string starts with `"f"`, process
> the rest of the string as follows…”  Swift is well-suited to expressing this
> common pattern beautifully, but we need to add the APIs.  Here are two examples
> of the sort of code that might be possible given such APIs:
> 
> ```swift
> if let firstLetter = input.droppingPrefix(alphabeticCharacter) {
>  somethingWith(input) // process the rest of input
> }
> 
> if let (number, restOfInput) = input.parsingPrefix(Int.self) {
>   ...
> }
> ```
> 
> The specific spelling and functionality of APIs like this are TBD.  The larger
> point is to make sure matching-and-consuming jobs are well-supported.

Yes.

> #### Unified Pattern Matcher Protocol
> 
> Many of the current methods that do matching are overloaded to do the same
> logical operations in different ways, with the following axes:
> 
> - Logical Operation: `find`, `split`, `replace`, match at start
> - Kind of pattern: `CharacterSet`, `String`, a regex, a closure
> - Options, e.g. case/diacritic sensitivity, locale.  Sometimes a part of
>  the method name, and sometimes an argument
> - Whole string or subrange.
> 
> We should represent these aspects as orthogonal, composable components,
> abstracting pattern matchers into a protocol like
> [this one](https://github.com/apple/swift/blob/master/test/Prototypes/PatternMatching.swift#L33),
> that can allow us to define logical operations once, without introducing
> overloads, and massively reducing API surface area.

*Very* yes.

> For example, using the strawman prefix `%` syntax to turn string literals into
> patterns, the following pairs would all invoke the same generic methods:
> 
> ```swift
> if let found = s.firstMatch(%"searchString") { ... }
> if let found = s.firstMatch(someRegex) { ... }
> 
> for m in s.allMatches((%"searchString"), case: .insensitive) { ... }
> for m in s.allMatches(someRegex) { ... }
> 
> let items = s.split(separatedBy: ", ")
> let tokens = s.split(separatedBy: CharacterSet.whitespace)
> ```

Very, *very* yes.

If we do this, rather than your `%` operator (or whatever it becomes), I wonder if we can have these extensions:

	// Assuming a protocol like:
	protocol Pattern {
		associatedtype PatternElement
		func matches<CollectionType: Collection>(…) -> … where CollectionType.Element == Element
	}
	extension Equatable: Pattern {
		typealias PatternElement = Self
		…
	}
	extension Collection: Pattern where Element: Equatable {
		typealias PatternElement = Element
	}

...although then `Collection` would conform to `Pattern` through both itself and (conditionally) `Equatable`. Hmm.

I suppose we faced this same problem elsewhere and ended up with things like:

	mutating func append(_ element: Element)
	mutating func append<Seq: Sequence>(contentsOf seq: Seq) where Seq.Iterator.Element == Element

So we could do things like:

	str.firstMatch("x")	// single element, so this is a Character
	str.firstMatch(contentsOf("xy"))
	str.firstMatch(anyOf(["x", "y"] as Set))

> #### Index Interchange Among Views

I really, really, really want this.

> We think random-access
> *code-unit storage* is a reasonable requirement to impose on all `String`
> instances.

Wait, you do? Doesn't that mean either using UTF-32, inventing a UTF-24 to use, or using some kind of complicated side table that adjusts for all the multi-unit characters in a UTF-16 or UTF-8 string? None of these sound ideal.

> Index interchange between `String` and its `unicodeScalars`, `codeUnits`,
> and [`extendedASCII`](#parsing-ascii-structure) views can be made entirely
> seamless by having them share an index type (semantics of indexing a `String`
> between grapheme cluster boundaries are TBD—it can either trap or be forgiving).

I think it should be forgiving, and I think it should be forgiving in a very specific way: It should treat indexing in the middle of a cluster as though you indexed at the beginning.

The reason is `AttributedString`. You can think of `AttributedString` as being a type which adds additional views to a `String`; these views are indexed by `String.Index`, just like `String`, `String.UnicodeScalarView`, et.al., and advancing an index with these views advances it to the beginning of the next run. But you can also just subscript these views with an arbitrary index in the middle of a run, and it'll work correctly.

I think it would be useful for this behavior to be consistent among all `String` views.

> Having a common index allows easy traversal into the interior of graphemes,
> something that is often needed, without making it likely that someone will do it
> by accident.
> 
> - `String.index(after:)` should advance to the next grapheme, even when the
>   index points partway through a grapheme.
> 
> - `String.index(before:)` should move to the start of the grapheme before
>   the current position.

Good.

> Seamless index interchange between `String` and its UTF-8 or UTF-16 views is not
> crucial, as the specifics of encoding should not be a concern for most use
> cases, and would impose needless costs on the indices of other views.

I don't know about this, at least for the UTF-16 view. Here's why:

> That leaves the interchange of bare indices with Cocoa APIs trafficking in
> `Int`.  Hopefully such APIs will be rare, but when needed, the following
> extension, which would be useful for all `Collections`, can help:
> 
> ```swift
> extension Collection {
>  func index(offset: IndexDistance) -> Index {
>    return index(startIndex, offsetBy: offset)
>  }
>  func offset(of i: Index) -> IndexDistance {
>    return distance(from: startIndex, to: i)
>  }
> }
> ```
> 
> Then integers can easily be translated into offsets into a `String`'s `utf16`
> view for consumption by Cocoa:
> 
> ```swift
> let cocoaIndex = s.utf16.offset(of: String.UTF16Index(i))
> let swiftIndex = s.utf16.index(offset: cocoaIndex)
> ```

I worry that this conversion will be too obscure. In Objective-C, you don't really think very much about what "character" means; it's just an index that points to a location inside the string. I don't think people will know to use the `utf16` view instead of the others—especially the plain `String` version, which would be the most obvious one to use.

I think I'd prefer to see the following:

1. UTF-16 is the storage format, at least for an "ordinary" `Swift.String`.

2. `String.Index` is used down to the `UTF16View`. It stores a UTF-16 offset.

3. With just the standard library imported, `String.Index` does not have any obvious way to convert to or from an `Int` offset; you use `index(_:offsetBy:)` on one of the views. `utf16`'s implementation is just faster than the others.

4. Foundation adds `init(_:)` methods to `String.Index` and `Int`, as well as `Range<String.Index>` and `NSRange`, which perform mutual conversions:

	XCTAssertEqual(Int(String.Index(cocoaIndex)), cocoaIndex)
	XCTAssertEqual(NSRange(Range<String.Index>(cocoaRange)), cocoaRange)

I think this would really help to guide people to the right APIs for the task.

(Also, it would make my `AttributedString` thing work better, too.)

> ### Formatting

Briefly: I am, let's say, 95% on board with your plan to replace format strings with interpolation and format methods. The remaining 5% concern is that it we'll need an adequate replacement for the ability to load a format string dynamically and have it reorder or alter the formatting of interpolated values. Obviously dynamic format strings are dangerous and limited, but where you *can* use them, they're invaluable.

> #### String Interpolation
> 
> Swift string interpolation provides a user-friendly alternative to printf's
> domain-specific language (just write ordinary swift code!) and its type safety
> problems (put the data right where it belongs!) but the following issues prevent
> it from being useful for localized formatting (among other jobs):
> 
>  * [SR-2303](https://bugs.swift.org/browse/SR-2303) We are unable to restrict
>    types used in string interpolation.
>  * [SR-1260](https://bugs.swift.org/browse/SR-1260) String interpolation can't
>    distinguish (fragments of) the base string from the string substitutions.

If I find some copious free time, I could try to develop proposals for one or both of these. Would there be interest in them at this point? (Feel free to contact me off-list about this, preferably in a new thread.)

(Okay, one random thought, because I can't resist: Perhaps the "\(…)" syntax can be translated directly into an `init(…)` on the type you're creating. That is, you can write:

	let x: MyString = "foo \(bar) baz \(quux, radix: 16)"

And that translates to:

	let x = MyString(stringInterpolationSegments:
		MyString(stringLiteral: "foo "),
		MyString(bar),
		MyString(stringLiteral: " baz "),
		MyString(quux, radix: 16)
	)

That would require you to redeclare `String` initializers on your own string type, but you probably need some of your own logic anyway, right?)

> In the long run, we should improve Swift string interpolation to the point where
> it can participate in most any formatting job.  Mostly this centers around
> fixing the interpolation protocols per the previous item, and supporting
> localization.

For what it's worth, by using a hacky workaround for SR-1260, I've written (Swift 2.0) code that passes strings with interpolations through the Foundation localized string tables: <https://gist.github.com/brentdax/79fa038c0af0cafb52dd> Obviously that's just a start, but it is incredibly convenient.

> ### C String Interop
> 
> Our support for interoperation with nul-terminated C strings is scattered and
> incoherent, with 6 ways to transform a C string into a `String` and four ways to
> do the inverse.  These APIs should be replaced with the following

These APIs are much better than the status quo, but it's a shame that we can't have them handle non-nul-terminated data, too.

Actually... (Begin shaggy dog story...)

Suppose you introduce an `UnsafeNulTerminatedBufferPointer` type. Then you could write a *very* high-level API which handles pretty much every conversion under the sun:

	extension String {
		/// Constructs a `String` from a sequence of `codeUnits` in an indicated `encoding`.
		/// 
		/// - Parameter codeUnits: A sequence of code units in the given `encoding`.
		/// - Parameter encoding: The encoding the code units are in.
		init<CodeUnits: Sequence, Encoding: UnicodeEncoding>(_ codeUnits: CodeUnits, encoding: Encoding)
			where CodeUnits.Iterator.Element == Encoding.CodeUnit
	}

For UTF-8, at least, that would cover reading from `Array`, `UnsafeBufferPointer`, `UnsafeRawBufferPointer`, `UnsafeNulTerminatedBufferPointer`, `Data`, you name it. Maybe we could have a second one that always takes something producing bytes, no matter the encoding used:

	extension String {
		/// Constructs a `String` from the code units contained in `bytes` in a given `encoding`.
		/// 
		/// - Parameter bytes: A sequence of bytes expressing code units in the given `encoding`.
		/// - Parameter encoding: The encoding the code units are in.
		init<Bytes: Sequence, Encoding: UnicodeEncoding>(_ codeUnits: CodeUnits, encoding: Encoding)
			where CodeUnits.Iterator.Element == UInt8
	}

These two initializers would replace...um, something like eight existing ones, including ones from Foundation. On the other hand, this is *very* generic. And, unless we actually changed the way `char *` imported to `UnsafeNulTerminatedBufferPointer<CChar>`, the C string call sequence would be pretty complicated:

	String(UnsafeNulTerminatedBufferPointer(start: cString), encoding: UTF8.self)

So you might end up having to wrap it in an `init(cString:)` anyway, just for convenience. Oh well, it was worth exploring.

Prototype of the above: https://gist.github.com/brentdax/8b71f46b424dc64abaa77f18556e607b

(Hmm...maybe bridge `char *` to a type like this instead?

	struct CCharPointer {
		var baseAddress: UnsafePointer<CChar> { get }
		var nulTerminated: UnsafeNulTerminatedBufferPointer<CChar> { get }
		func ofLength(_ length: Int) -> UnsafeBufferPointer<CChar>
	}

Nah, probably not gonna happen...)

>  init(cString nulTerminatedUTF8: UnsafePointer<CChar>)

By the way, I just noticed an impedance mismatch in current Swift: `CChar` is usually an `Int8`, but `UnicodeScalar` and `UTF8` currently want `UInt8`. It'd be nice to address this somehow, if only by adding some signed variants or something.

> ### High-Performance String Processing
> 
> Many strings are short enough to store in 64 bits, many can be stored using only
> 8 bits per unicode scalar, others are best encoded in UTF-16, and some come to
> us already in some other encoding, such as UTF-8, that would be costly to
> translate.  Supporting these formats while maintaining usability for
> general-purpose APIs demands that a single `String` type can be backed by many
> different representations.

Just putting a pin in this, because I'll want to discuss it a little later.

> ### Parsing ASCII Structure
> 
> Although many machine-readable formats support the inclusion of arbitrary
> Unicode text, it is also common that their fundamental structure lies entirely
> within the ASCII subset (JSON, YAML, many XML formats).  These formats are often
> processed most efficiently by recognizing ASCII structural elements as ASCII,
> and capturing the arbitrary sections between them in more-general strings.  The
> current String API offers no way to efficiently recognize ASCII and skip past
> everything else without the overhead of full decoding into unicode scalars.
> 
> For these purposes, strings should supply an `extendedASCII` view that is a
> collection of `UInt32`, where values less than `0x80` represent the
> corresponding ASCII character, and other values represent data that is specific
> to the underlying encoding of the string.

This sounds interesting, but:

1. It doesn't sound like you anticipate there being any way to compare an element of the `extendedASCII` view to a character literal. That seems like it'd be really useful.

2. I don't really understand how you envision using the "data specific to the underlying encoding" sections. Presumably you'll want to convert that data into a string eventually, right?

Do you have pseudocode or something lying around that might help us understand how you think this might be used?

> ### Do we need a type-erasable base protocol for UnicodeEncoding?
> 
> UnicodeEncoding has an associated type, but it may be important to be able to
> traffic in completely dynamic encoding values, e.g. for “tell me the most
> efficient encoding for this string.”

As long as you're here, we haven't talked about `UnicodeEncoding` much. I assume this is a slightly modified version of `UnicodeCodec`? Anything to say about it?

If it *is* similar to `UnicodeCodec`, one thing I will note is that the way `UnicodeCodec` works in code units is rather annoying for I/O. It may make sense to have some sort of type-erasing wrapper around `UnicodeCodec` which always uses bytes. (You then have to worry about endianness, of course...)

> ### Should there be a string “facade?”
>> An interesting variation on this design is possible if defaulted generic
> parameters are introduced to the language:
> 
> ```swift
> struct String<U: Unicode = StringStorage> 
>  : BidirectionalCollection {
> 
>  // ...APIs for high-level string processing here...
> 
>  var unicode: U // access to lower-level unicode details
> }
> 
> typealias Substring = String<StringStorage.SubSequence>
> ```

I think this is a very, very interesting idea. A few notes:

* Earlier, I said I didn't like `Unicode` as a protocol name. If we go this route, I think `StringStorage` is a good name for that protocol. The default storage might be something like `UTF16StringStorage`, or just, you know, `DefaultStringStorage`.

* Earlier, you mentioned the tension between using multiple representations for flexibility and pinning down one representation for speed. One way to handle this might be to have `String`'s default `StringStorage` be a superclass or type-erased wrapper or something. That way, if you just write `String`, you get something flexible; if you write `String<NFCNormalizedUTF16StringStorage>`, you get something fast.

* Could `NSString` be a `StringStorage`, or support a trivial wrapper that converts it into a `StringStorage`? Would that be helpful at all?

* If we do this, does `String.Index` become a type-specific thing? That is, might `String<UTF8Storage>.Index` be different from `String<UTF16Storage>.Index`? What does that mean for `String.Index` unification?

> ### `description` and `debugDescription`
> 
> * Should these be creating localized or non-localized representations?

`debugDescription`, I think, is non-localized; it's something helpful for the programmer, and the programmer's language is not the user's. It's also usually something you don't want to put *too* much effort into, other than to dump a lot of data about the instance.

`description` would have to change to be localizable. (Specifically, it would have to take a locale.) This is doable, of course, but it hasn't been done yet.

> * Is returning a `String` efficient enough?

I'm not sure how important efficiency is for `description`, honestly.

> * Is `debugDescription` pulling the weight of the API surface area it adds?

Maybe? Or maybe it's better off as part of the `Mirror` instead of a property on the instance itself.

> ### `StaticString`
> 
> `StaticString` was added as a byproduct of standard library developed and kept
> around because it seemed useful, but it was never truly *designed* for client
> programmers.  We need to decide what happens with it.  Presumably *something*
> should fill its role, and that should conform to `Unicode`.

Maybe. One complication there is that `Unicode` presumably supports mutation, which `StaticString` doesn't.

Another possibility I've discussed in the past is renaming `StaticString` to `StringLiteral` and using it largely as a way to initialize `String`. (I mentioned that in a thread about the need for public integer and floating-point literal types that are more expressive now that we're supporting larger integer/float types.) It could have just enough API surface to access it as a buffer of UTF-8 bytes and thereby build a `String` or `Data` from it.

Well, that's it for this massive email. You guys are doing a hell of a job on this.

Hope this helps,
-- 
Brent Royal-Gordon
Architechies



More information about the swift-evolution mailing list