[swift-evolution] Pitch: String Index Overhaul

Michael Ilseman milseman at apple.com
Tue May 30 15:18:11 CDT 2017


> On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
> 
> 
> Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md
> 
> ----
> 
> # String Index Overhaul
> 
> * Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
> * Authors: [Dave Abrahams](https://github.com/dabrahams)
> * Review Manager: TBD
> * Status: **Awaiting review**
> * Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806 
> 
> *During the review process, add the following fields as needed:*
> 
> ## Introduction
> 
> Today `String` shares an `Index` type with its `CharacterView` but not
> with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`.  This
> proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
> and `String.CharacterView.Index` as typealiases for `String.Index`,
> and exposes a public `encodedOffset` property and initializer that can
> be used to serialize and deserialize positions in a `String` or
> `Substring`.
> 
> Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)
> 
> ## Motivation
> 
> The different index types are supported by a set of `Index`
> initializers, which are failable whenever the source index might not
> correspond to a position in the target view:
> 
> ```swift
> if let j = String.UnicodeScalarView.Index(
>  someUTF16Position, within: s.unicodeScalars) {
>  ... 
> }
> ```
> 
> The current API is as follows:
> 
> ```swift
> public extension String.Index {
>  init?(_: String.UnicodeScalarIndex, within: String)
>  init?(_: String.UTF16Index, within: String)
>  init?(_: String.UTF8Index, within: String)
> }
> 
> public extension String.UTF16View.Index {
>  init?(_: String.UTF8Index, within: String.UTF16View)
>  init(_: String.UnicodeScalarIndex, within: String.UTF16View)
>  init(_: String.Index, within: String.UTF16View)
> }
> 
> public extension String.UTF8View.Index {
>  init?(_: String.UTF16Index, within: String.UTF8View)
>  init(_: String.UnicodeScalarIndex, within: String.UTF8View)
>  init(_: String.Index, within: String.UTF8View)
> }
> 
> public extension String.UnicodeScalarView.Index {
>  init?(_: String.UTF16Index, within: String.UnicodeScalarView)
>  init?(_: String.UTF8Index, within: String.UnicodeScalarView)
>  init(_: String.Index, within: String.UnicodeScalarView)
> }
> ```
> 
> These initializers are supplemented by a corresponding set of
> convenience conversion methods:
> 
> ```swift
> if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
>  ... 
> }
> ```
> 
> with the following API:
> 
> ```swift
> public extension String.Index {
>  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
>  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
>  func samePosition(
>    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
> }
> 
> public extension String.UTF16View.Index {
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
>  func samePosition(
>    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
> }
> 
> public extension String.UTF8View.Index {
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
>  func samePosition(
>    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
> }
> 
> public extension String.UnicodeScalarView.Index {
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
>  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
> }
> ```
> 
> The result is a great deal of API surface area for apparently little
> gain in ordinary code, that normally only interchanges indices among
> views when the positions match up exactly (i.e. when the conversion is
> going to succeed).  Also, the resulting code is needlessly awkward.
> 
> Finally, the opacity of these index types makes it difficult to record
> `String` or `Substring` positions in files or other archival forms,
> and reconstruct the original positions with respect to a deserialized
> `String` or `Substring`.
> 
> ## Proposed solution
> 
> All `String` views will use a single index type (`String.Index`), so
> that positions can be interchanged without awkward explicit
> conversions:
> 
> ```swift
> let html: String = "See <a href=\"http://swift.org\">swift.org</a>"
> 
> // Search the UTF16, instead of characters, for performance reasons:
> let open = "<".utf16.first!, close = ">".utf16.first!
> let tagStart = s.utf16.index(of: open)
> let tagEnd = s.utf16[tagStart...].index(of: close)
> 
> // Slice the String with the UTF-16 indices to retrieve the tag.
> let tag = html[tagStart...tagEnd]
> ```
> 
> A property and an intializer will be added to `String.Index`, exposing
> the offset of the index in code units (currently only UTF-16) from the
> beginning of the string:
> 
> ```swift
> let n: Int = html.endIndex.encodedOffset
> let end = String.Index(encodedOffset: n)
> assert(end == String.endIndex)
> ```
> 
> # Comparison and Slicing Semantics
> 
> When two indices being compared correspond to positions that are valid
> in any single `String` view, comparison semantics are already fully
> specified by the `Collection` requirements.  Where no single `String`
> view contains both index values, the indices compare unequal and
> ordering is determined by comparison of `encodedOffsets`.  These index
> values are not totally ordered but do satisfy strict weak ordering
> requirements, which is sufficient for algorithms such as `sort` to
> exhibit sensible behavior.  We might consider loosening the specified
> requirements on these algorithms and on `Comparable` to support strict
> weak ordering, but for now we can treat such index pairs as being
> outside the domain of comparison, like any other indices from
> completely distinct collections.
> 
> An index that does not fall on an exact boundary in a given `String`
> or `Substring` view will be “rounded down” to the nearest boundary
> when used for slicing or range replacement.  So, for example,
> 

What about normal subscript? I.e. what would the following print?

print(s[s.unicodeScalars.indices.dropFirst().first!]) // “é”, or just the combining scalar?

Would unifying under the same type require that indices be less stateful than they currently are?


> ```swift
> let s = "e\u{301}galite\u{301}"                          // "égalité"
> print(s[s.unicodeScalars.indices.dropFirst().first!...]) // “égalité"
> print(s[..<s.unicodeScalars.indices.last!])              // "égalit"
> ```
> 
> Replacing the failable APIs listed [above](#motivation) that detect
> whether an index represents a valid position in a given view, and
> enhancement that explicitly round index positions to nearby boundaries
> in a given view, are left to a later proposal.  For now, we do not
> propose to remove the existing index conversion APIs.
> 
> ## Detailed design
> 
> `String.Index` acquires an `encodedOffset` property and initializer:
> 
> ```swift
> public extension String.Index {
>  /// Creates a position corresponding to the given offset in a
>  /// `String`'s underlying (UTF-16) code units.
>  init(encodedOffset: Int)
> 
>  /// The position of this index expressed as an offset from the
>  /// beginning of the `String`'s underlying (UTF-16) code units.
>  var encodedOffset: Int
> }
> ```
> 
> `Index` types of `String.UTF8View`, `String.UTF16View`, and
> `String.UnicodeScalarView` are replaced by `String.Index`:
> 
> ```swift
> public extension String.UTF8View {
>  typealias Index = String.Index
> }
> public extension String.UTF16View {
>  typealias Index = String.Index
> }
> public extension String.UnicodeScalarView {
>  typealias Index = String.Index
> }
> ```
> 
> Because the index types are collapsing, index conversion methods and
> initializers are reduced to the following:
> 
> ```swift
> public extension String.Index {
>  init?(_: String.Index, within: String)
>  init?(_: String.Index, within: String.UTF8View)
>  init?(_: String.Index, within: String.UTF16View)
>  init?(_: String.Index, within: String.UnicodeScalarView)
> 
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF8View) -> String.Index?
>  func samePosition(in: String.UTF16View) -> String.Index?
>  func samePosition(in: String.UnicodeScalarView) -> String.Index?
> }
> ```
> 
> ## Source compatibility
> 
> Because of the collapse of index
> types, [existing non-failable APIs](#motivation) become failable.  To
> avoid breaking Swift 3 code, the following overloads of existing
> functions are added, allowing the resulting optional indices to be
> used where previously non-optional indices were used.  These overloads
> were driven by making the new APIs work with existing code, including
> the Swift source compatibility test suite, and should be viewed as
> migration aids only, rather than additions to the Swift 3 API.
> 
> ```swift
> extension Optional where Wrapped == String.Index {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public static func ..<(
>    lhs: String.Index?, rhs: String.Index?
>  ) -> Range<String.Index> {
>    return lhs! ..< rhs!
>  }
> 
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public static func ...(
>    lhs: String.Index?, rhs: String.Index?
>  ) -> ClosedRange<String.Index> {
>    return lhs! ... rhs!
>  }
> }
> 
> // backward compatibility for index interchange.  
> extension String.UTF16View {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(after i: Index?) -> Index {
>    return index(after: i)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(
>    _ i: Index?, offsetBy n: IndexDistance) -> Index {
>    return index(i!, offsetBy: n)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
>    return distance(from: i!, to: j!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
>    return self[i!]
>  }
> }
> 
> extension String.UTF8View {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(after i: Index?) -> Index {
>    return index(after: i!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
>    return index(i!, offsetBy: n)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public func distance(
>    from i: Index?, to j: Index?) -> IndexDistance {
>    return distance(from: i!, to: j!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
>    return self[i!]
>  }
> }
> 
> // backward compatibility for index interchange.  
> extension String.UnicodeScalarView {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(after i: Index?) -> Index {
>    return index(after: i)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(_ i: Index?,  offsetBy n: IndexDistance) -> Index {
>    return index(i!, offsetBy: n)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
>    return distance(from: i!, to: j!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public subscript(i: Index?) -> Unicode.Scalar {
>    return self[i!]
>  }
> }
> ```
> 
> - **Q**: Will existing correct Swift 3 applications stop compiling due
>  to this change?
> 
>  **A**: it is possible but unlikely.  The existing index conversion
>  APIs are relatively rarely used, and the overloads listed above
>  handle the common cases in Swift 3 compatibility mode.
> 
> - **Q**: Will applications still compile but produce
>  different behavior than they used to? 
> 
>  **A**: No.
> 
> - **Q**: Is it possible to automatically migrate from the old syntax
>  to the new syntax? 
> 
>  **A**: Yes, although usages of these APIs may be rare enough that it
>  isn't worth the trouble.
> 
> - **Q**: Can Swift applications be written in a common subset that works
>   both with Swift 3 and Swift 4 to aid in migration?
> 
>  **A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.
> 
> ## Effect on ABI stability
> 
> This proposal changes the ABI of the standard library.
> 
> ## Effect on API resilience
> 
> This proposal makes no changes to the resilience of any APIs.
> 
> ## Alternatives considered
> 
> The only alternative considered was no action.
> 
> 
> -- 
> -Dave
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution



More information about the swift-evolution mailing list