[swift-evolution] Pitch: String Index Overhaul
Michael Ilseman
milseman at apple.com
Tue May 30 15:18:11 CDT 2017
> On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
>
>
> Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md
>
> ----
>
> # String Index Overhaul
>
> * Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
> * Authors: [Dave Abrahams](https://github.com/dabrahams)
> * Review Manager: TBD
> * Status: **Awaiting review**
> * Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806
>
> *During the review process, add the following fields as needed:*
>
> ## Introduction
>
> Today `String` shares an `Index` type with its `CharacterView` but not
> with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
> proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
> and `String.CharacterView.Index` as typealiases for `String.Index`,
> and exposes a public `encodedOffset` property and initializer that can
> be used to serialize and deserialize positions in a `String` or
> `Substring`.
>
> Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)
>
> ## Motivation
>
> The different index types are supported by a set of `Index`
> initializers, which are failable whenever the source index might not
> correspond to a position in the target view:
>
> ```swift
> if let j = String.UnicodeScalarView.Index(
> someUTF16Position, within: s.unicodeScalars) {
> ...
> }
> ```
>
> The current API is as follows:
>
> ```swift
> public extension String.Index {
> init?(_: String.UnicodeScalarIndex, within: String)
> init?(_: String.UTF16Index, within: String)
> init?(_: String.UTF8Index, within: String)
> }
>
> public extension String.UTF16View.Index {
> init?(_: String.UTF8Index, within: String.UTF16View)
> init(_: String.UnicodeScalarIndex, within: String.UTF16View)
> init(_: String.Index, within: String.UTF16View)
> }
>
> public extension String.UTF8View.Index {
> init?(_: String.UTF16Index, within: String.UTF8View)
> init(_: String.UnicodeScalarIndex, within: String.UTF8View)
> init(_: String.Index, within: String.UTF8View)
> }
>
> public extension String.UnicodeScalarView.Index {
> init?(_: String.UTF16Index, within: String.UnicodeScalarView)
> init?(_: String.UTF8Index, within: String.UnicodeScalarView)
> init(_: String.Index, within: String.UnicodeScalarView)
> }
> ```
>
> These initializers are supplemented by a corresponding set of
> convenience conversion methods:
>
> ```swift
> if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
> ...
> }
> ```
>
> with the following API:
>
> ```swift
> public extension String.Index {
> func samePosition(in: String.UTF8View) -> String.UTF8View.Index
> func samePosition(in: String.UTF16View) -> String.UTF16View.Index
> func samePosition(
> in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
> }
>
> public extension String.UTF16View.Index {
> func samePosition(in: String) -> String.Index?
> func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
> func samePosition(
> in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
> }
>
> public extension String.UTF8View.Index {
> func samePosition(in: String) -> String.Index?
> func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
> func samePosition(
> in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
> }
>
> public extension String.UnicodeScalarView.Index {
> func samePosition(in: String) -> String.Index?
> func samePosition(in: String.UTF8View) -> String.UTF8View.Index
> func samePosition(in: String.UTF16View) -> String.UTF16View.Index
> }
> ```
>
> The result is a great deal of API surface area for apparently little
> gain in ordinary code, that normally only interchanges indices among
> views when the positions match up exactly (i.e. when the conversion is
> going to succeed). Also, the resulting code is needlessly awkward.
>
> Finally, the opacity of these index types makes it difficult to record
> `String` or `Substring` positions in files or other archival forms,
> and reconstruct the original positions with respect to a deserialized
> `String` or `Substring`.
>
> ## Proposed solution
>
> All `String` views will use a single index type (`String.Index`), so
> that positions can be interchanged without awkward explicit
> conversions:
>
> ```swift
> let html: String = "See <a href=\"http://swift.org\">swift.org</a>"
>
> // Search the UTF16, instead of characters, for performance reasons:
> let open = "<".utf16.first!, close = ">".utf16.first!
> let tagStart = s.utf16.index(of: open)
> let tagEnd = s.utf16[tagStart...].index(of: close)
>
> // Slice the String with the UTF-16 indices to retrieve the tag.
> let tag = html[tagStart...tagEnd]
> ```
>
> A property and an intializer will be added to `String.Index`, exposing
> the offset of the index in code units (currently only UTF-16) from the
> beginning of the string:
>
> ```swift
> let n: Int = html.endIndex.encodedOffset
> let end = String.Index(encodedOffset: n)
> assert(end == String.endIndex)
> ```
>
> # Comparison and Slicing Semantics
>
> When two indices being compared correspond to positions that are valid
> in any single `String` view, comparison semantics are already fully
> specified by the `Collection` requirements. Where no single `String`
> view contains both index values, the indices compare unequal and
> ordering is determined by comparison of `encodedOffsets`. These index
> values are not totally ordered but do satisfy strict weak ordering
> requirements, which is sufficient for algorithms such as `sort` to
> exhibit sensible behavior. We might consider loosening the specified
> requirements on these algorithms and on `Comparable` to support strict
> weak ordering, but for now we can treat such index pairs as being
> outside the domain of comparison, like any other indices from
> completely distinct collections.
>
> An index that does not fall on an exact boundary in a given `String`
> or `Substring` view will be “rounded down” to the nearest boundary
> when used for slicing or range replacement. So, for example,
>
What about normal subscript? I.e. what would the following print?
print(s[s.unicodeScalars.indices.dropFirst().first!]) // “é”, or just the combining scalar?
Would unifying under the same type require that indices be less stateful than they currently are?
> ```swift
> let s = "e\u{301}galite\u{301}" // "égalité"
> print(s[s.unicodeScalars.indices.dropFirst().first!...]) // “égalité"
> print(s[..<s.unicodeScalars.indices.last!]) // "égalit"
> ```
>
> Replacing the failable APIs listed [above](#motivation) that detect
> whether an index represents a valid position in a given view, and
> enhancement that explicitly round index positions to nearby boundaries
> in a given view, are left to a later proposal. For now, we do not
> propose to remove the existing index conversion APIs.
>
> ## Detailed design
>
> `String.Index` acquires an `encodedOffset` property and initializer:
>
> ```swift
> public extension String.Index {
> /// Creates a position corresponding to the given offset in a
> /// `String`'s underlying (UTF-16) code units.
> init(encodedOffset: Int)
>
> /// The position of this index expressed as an offset from the
> /// beginning of the `String`'s underlying (UTF-16) code units.
> var encodedOffset: Int
> }
> ```
>
> `Index` types of `String.UTF8View`, `String.UTF16View`, and
> `String.UnicodeScalarView` are replaced by `String.Index`:
>
> ```swift
> public extension String.UTF8View {
> typealias Index = String.Index
> }
> public extension String.UTF16View {
> typealias Index = String.Index
> }
> public extension String.UnicodeScalarView {
> typealias Index = String.Index
> }
> ```
>
> Because the index types are collapsing, index conversion methods and
> initializers are reduced to the following:
>
> ```swift
> public extension String.Index {
> init?(_: String.Index, within: String)
> init?(_: String.Index, within: String.UTF8View)
> init?(_: String.Index, within: String.UTF16View)
> init?(_: String.Index, within: String.UnicodeScalarView)
>
> func samePosition(in: String) -> String.Index?
> func samePosition(in: String.UTF8View) -> String.Index?
> func samePosition(in: String.UTF16View) -> String.Index?
> func samePosition(in: String.UnicodeScalarView) -> String.Index?
> }
> ```
>
> ## Source compatibility
>
> Because of the collapse of index
> types, [existing non-failable APIs](#motivation) become failable. To
> avoid breaking Swift 3 code, the following overloads of existing
> functions are added, allowing the resulting optional indices to be
> used where previously non-optional indices were used. These overloads
> were driven by making the new APIs work with existing code, including
> the Swift source compatibility test suite, and should be viewed as
> migration aids only, rather than additions to the Swift 3 API.
>
> ```swift
> extension Optional where Wrapped == String.Index {
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
> public static func ..<(
> lhs: String.Index?, rhs: String.Index?
> ) -> Range<String.Index> {
> return lhs! ..< rhs!
> }
>
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
> public static func ...(
> lhs: String.Index?, rhs: String.Index?
> ) -> ClosedRange<String.Index> {
> return lhs! ... rhs!
> }
> }
>
> // backward compatibility for index interchange.
> extension String.UTF16View {
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public func index(after i: Index?) -> Index {
> return index(after: i)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public func index(
> _ i: Index?, offsetBy n: IndexDistance) -> Index {
> return index(i!, offsetBy: n)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
> public func distance(from i: Index?, to j: Index?) -> IndexDistance {
> return distance(from: i!, to: j!)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
> return self[i!]
> }
> }
>
> extension String.UTF8View {
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public func index(after i: Index?) -> Index {
> return index(after: i!)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
> return index(i!, offsetBy: n)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
> public func distance(
> from i: Index?, to j: Index?) -> IndexDistance {
> return distance(from: i!, to: j!)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
> return self[i!]
> }
> }
>
> // backward compatibility for index interchange.
> extension String.UnicodeScalarView {
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public func index(after i: Index?) -> Index {
> return index(after: i)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
> return index(i!, offsetBy: n)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
> public func distance(from i: Index?, to j: Index?) -> IndexDistance {
> return distance(from: i!, to: j!)
> }
> @available(
> swift, deprecated: 3.2, obsoleted: 4.0,
> message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
> public subscript(i: Index?) -> Unicode.Scalar {
> return self[i!]
> }
> }
> ```
>
> - **Q**: Will existing correct Swift 3 applications stop compiling due
> to this change?
>
> **A**: it is possible but unlikely. The existing index conversion
> APIs are relatively rarely used, and the overloads listed above
> handle the common cases in Swift 3 compatibility mode.
>
> - **Q**: Will applications still compile but produce
> different behavior than they used to?
>
> **A**: No.
>
> - **Q**: Is it possible to automatically migrate from the old syntax
> to the new syntax?
>
> **A**: Yes, although usages of these APIs may be rare enough that it
> isn't worth the trouble.
>
> - **Q**: Can Swift applications be written in a common subset that works
> both with Swift 3 and Swift 4 to aid in migration?
>
> **A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.
>
> ## Effect on ABI stability
>
> This proposal changes the ABI of the standard library.
>
> ## Effect on API resilience
>
> This proposal makes no changes to the resilience of any APIs.
>
> ## Alternatives considered
>
> The only alternative considered was no action.
>
>
> --
> -Dave
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
More information about the swift-evolution
mailing list