[swift-evolution] Pitch: String Index Overhaul

Philippe Hausler phausler at apple.com
Wed May 31 13:13:38 CDT 2017


I would presume that the index type will still be shared between String and SubString, will this mean that we will now be able to express index manipulation in StringProtocol?

I find StringProtocol a bit hard to deal with when attempting to make range conversions; it would be really nice if we could make this possible (or perhaps more intuitive... since, for the life of me I can’t figure out a way to generically convert indexes for StringProtocol adoption)

So lets say you have a function as such: 

func foo<S: StringProtocol>(_ str: S, range: Range<S.Index>) {
    range.lowerBound.samePosition(in: str.utf16)
}

results in the error error: value of type 'S.Index' has no member ‘samePosition’

This of course is an intended target of something that deals with strings and wants to deal with both strings and substrings uniformly since it is reasonable to pass either.

In short: are StringProtocol accessors a consideration for conversion in this change?

> On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
> 
> 
> Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md
> 
> ----
> 
> # String Index Overhaul
> 
> * Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
> * Authors: [Dave Abrahams](https://github.com/dabrahams)
> * Review Manager: TBD
> * Status: **Awaiting review**
> * Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806 
> 
> *During the review process, add the following fields as needed:*
> 
> ## Introduction
> 
> Today `String` shares an `Index` type with its `CharacterView` but not
> with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`.  This
> proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
> and `String.CharacterView.Index` as typealiases for `String.Index`,
> and exposes a public `encodedOffset` property and initializer that can
> be used to serialize and deserialize positions in a `String` or
> `Substring`.
> 
> Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)
> 
> ## Motivation
> 
> The different index types are supported by a set of `Index`
> initializers, which are failable whenever the source index might not
> correspond to a position in the target view:
> 
> ```swift
> if let j = String.UnicodeScalarView.Index(
>  someUTF16Position, within: s.unicodeScalars) {
>  ... 
> }
> ```
> 
> The current API is as follows:
> 
> ```swift
> public extension String.Index {
>  init?(_: String.UnicodeScalarIndex, within: String)
>  init?(_: String.UTF16Index, within: String)
>  init?(_: String.UTF8Index, within: String)
> }
> 
> public extension String.UTF16View.Index {
>  init?(_: String.UTF8Index, within: String.UTF16View)
>  init(_: String.UnicodeScalarIndex, within: String.UTF16View)
>  init(_: String.Index, within: String.UTF16View)
> }
> 
> public extension String.UTF8View.Index {
>  init?(_: String.UTF16Index, within: String.UTF8View)
>  init(_: String.UnicodeScalarIndex, within: String.UTF8View)
>  init(_: String.Index, within: String.UTF8View)
> }
> 
> public extension String.UnicodeScalarView.Index {
>  init?(_: String.UTF16Index, within: String.UnicodeScalarView)
>  init?(_: String.UTF8Index, within: String.UnicodeScalarView)
>  init(_: String.Index, within: String.UnicodeScalarView)
> }
> ```
> 
> These initializers are supplemented by a corresponding set of
> convenience conversion methods:
> 
> ```swift
> if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
>  ... 
> }
> ```
> 
> with the following API:
> 
> ```swift
> public extension String.Index {
>  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
>  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
>  func samePosition(
>    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
> }
> 
> public extension String.UTF16View.Index {
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
>  func samePosition(
>    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
> }
> 
> public extension String.UTF8View.Index {
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
>  func samePosition(
>    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
> }
> 
> public extension String.UnicodeScalarView.Index {
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
>  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
> }
> ```
> 
> The result is a great deal of API surface area for apparently little
> gain in ordinary code, that normally only interchanges indices among
> views when the positions match up exactly (i.e. when the conversion is
> going to succeed).  Also, the resulting code is needlessly awkward.
> 
> Finally, the opacity of these index types makes it difficult to record
> `String` or `Substring` positions in files or other archival forms,
> and reconstruct the original positions with respect to a deserialized
> `String` or `Substring`.
> 
> ## Proposed solution
> 
> All `String` views will use a single index type (`String.Index`), so
> that positions can be interchanged without awkward explicit
> conversions:
> 
> ```swift
> let html: String = "See <a href=\"http://swift.org\">swift.org</a>"
> 
> // Search the UTF16, instead of characters, for performance reasons:
> let open = "<".utf16.first!, close = ">".utf16.first!
> let tagStart = s.utf16.index(of: open)
> let tagEnd = s.utf16[tagStart...].index(of: close)
> 
> // Slice the String with the UTF-16 indices to retrieve the tag.
> let tag = html[tagStart...tagEnd]
> ```
> 
> A property and an intializer will be added to `String.Index`, exposing
> the offset of the index in code units (currently only UTF-16) from the
> beginning of the string:
> 
> ```swift
> let n: Int = html.endIndex.encodedOffset
> let end = String.Index(encodedOffset: n)
> assert(end == String.endIndex)
> ```
> 
> # Comparison and Slicing Semantics
> 
> When two indices being compared correspond to positions that are valid
> in any single `String` view, comparison semantics are already fully
> specified by the `Collection` requirements.  Where no single `String`
> view contains both index values, the indices compare unequal and
> ordering is determined by comparison of `encodedOffsets`.  These index
> values are not totally ordered but do satisfy strict weak ordering
> requirements, which is sufficient for algorithms such as `sort` to
> exhibit sensible behavior.  We might consider loosening the specified
> requirements on these algorithms and on `Comparable` to support strict
> weak ordering, but for now we can treat such index pairs as being
> outside the domain of comparison, like any other indices from
> completely distinct collections.
> 
> An index that does not fall on an exact boundary in a given `String`
> or `Substring` view will be “rounded down” to the nearest boundary
> when used for slicing or range replacement.  So, for example,
> 
> ```swift
> let s = "e\u{301}galite\u{301}"                          // "égalité"
> print(s[s.unicodeScalars.indices.dropFirst().first!...]) // "égalité"
> print(s[..<s.unicodeScalars.indices.last!])              // "égalit"
> ```
> 
> Replacing the failable APIs listed [above](#motivation) that detect
> whether an index represents a valid position in a given view, and
> enhancement that explicitly round index positions to nearby boundaries
> in a given view, are left to a later proposal.  For now, we do not
> propose to remove the existing index conversion APIs.
> 
> ## Detailed design
> 
> `String.Index` acquires an `encodedOffset` property and initializer:
> 
> ```swift
> public extension String.Index {
>  /// Creates a position corresponding to the given offset in a
>  /// `String`'s underlying (UTF-16) code units.
>  init(encodedOffset: Int)
> 
>  /// The position of this index expressed as an offset from the
>  /// beginning of the `String`'s underlying (UTF-16) code units.
>  var encodedOffset: Int
> }
> ```
> 
> `Index` types of `String.UTF8View`, `String.UTF16View`, and
> `String.UnicodeScalarView` are replaced by `String.Index`:
> 
> ```swift
> public extension String.UTF8View {
>  typealias Index = String.Index
> }
> public extension String.UTF16View {
>  typealias Index = String.Index
> }
> public extension String.UnicodeScalarView {
>  typealias Index = String.Index
> }
> ```
> 
> Because the index types are collapsing, index conversion methods and
> initializers are reduced to the following:
> 
> ```swift
> public extension String.Index {
>  init?(_: String.Index, within: String)
>  init?(_: String.Index, within: String.UTF8View)
>  init?(_: String.Index, within: String.UTF16View)
>  init?(_: String.Index, within: String.UnicodeScalarView)
> 
>  func samePosition(in: String) -> String.Index?
>  func samePosition(in: String.UTF8View) -> String.Index?
>  func samePosition(in: String.UTF16View) -> String.Index?
>  func samePosition(in: String.UnicodeScalarView) -> String.Index?
> }
> ```
> 
> ## Source compatibility
> 
> Because of the collapse of index
> types, [existing non-failable APIs](#motivation) become failable.  To
> avoid breaking Swift 3 code, the following overloads of existing
> functions are added, allowing the resulting optional indices to be
> used where previously non-optional indices were used.  These overloads
> were driven by making the new APIs work with existing code, including
> the Swift source compatibility test suite, and should be viewed as
> migration aids only, rather than additions to the Swift 3 API.
> 
> ```swift
> extension Optional where Wrapped == String.Index {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public static func ..<(
>    lhs: String.Index?, rhs: String.Index?
>  ) -> Range<String.Index> {
>    return lhs! ..< rhs!
>  }
> 
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public static func ...(
>    lhs: String.Index?, rhs: String.Index?
>  ) -> ClosedRange<String.Index> {
>    return lhs! ... rhs!
>  }
> }
> 
> // backward compatibility for index interchange.  
> extension String.UTF16View {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(after i: Index?) -> Index {
>    return index(after: i)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(
>    _ i: Index?, offsetBy n: IndexDistance) -> Index {
>    return index(i!, offsetBy: n)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
>    return distance(from: i!, to: j!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
>    return self[i!]
>  }
> }
> 
> extension String.UTF8View {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(after i: Index?) -> Index {
>    return index(after: i!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
>    return index(i!, offsetBy: n)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public func distance(
>    from i: Index?, to j: Index?) -> IndexDistance {
>    return distance(from: i!, to: j!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
>    return self[i!]
>  }
> }
> 
> // backward compatibility for index interchange.  
> extension String.UnicodeScalarView {
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(after i: Index?) -> Index {
>    return index(after: i)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public func index(_ i: Index?,  offsetBy n: IndexDistance) -> Index {
>    return index(i!, offsetBy: n)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
>  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
>    return distance(from: i!, to: j!)
>  }
>  @available(
>    swift, deprecated: 3.2, obsoleted: 4.0,
>    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
>  public subscript(i: Index?) -> Unicode.Scalar {
>    return self[i!]
>  }
> }
> ```
> 
> - **Q**: Will existing correct Swift 3 applications stop compiling due
>  to this change?
> 
>  **A**: it is possible but unlikely.  The existing index conversion
>  APIs are relatively rarely used, and the overloads listed above
>  handle the common cases in Swift 3 compatibility mode.
> 
> - **Q**: Will applications still compile but produce
>  different behavior than they used to? 
> 
>  **A**: No.
> 
> - **Q**: Is it possible to automatically migrate from the old syntax
>  to the new syntax? 
> 
>  **A**: Yes, although usages of these APIs may be rare enough that it
>  isn't worth the trouble.
> 
> - **Q**: Can Swift applications be written in a common subset that works
>   both with Swift 3 and Swift 4 to aid in migration?
> 
>  **A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.
> 
> ## Effect on ABI stability
> 
> This proposal changes the ABI of the standard library.
> 
> ## Effect on API resilience
> 
> This proposal makes no changes to the resilience of any APIs.
> 
> ## Alternatives considered
> 
> The only alternative considered was no action.
> 
> 
> -- 
> -Dave
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170531/12edb307/attachment.html>


More information about the swift-evolution mailing list