[swift-evolution] Pitch: String Index Overhaul

Dave Abrahams dabrahams at apple.com
Sat May 27 12:40:40 CDT 2017


Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md

----

# String Index Overhaul

* Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
* Authors: [Dave Abrahams](https://github.com/dabrahams)
* Review Manager: TBD
* Status: **Awaiting review**
* Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806 

*During the review process, add the following fields as needed:*

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`.  This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)

## Motivation

The different index types are supported by a set of `Index`
initializers, which are failable whenever the source index might not
correspond to a position in the target view:

```swift
if let j = String.UnicodeScalarView.Index(
  someUTF16Position, within: s.unicodeScalars) {
  ... 
}
```

The current API is as follows:

```swift
public extension String.Index {
  init?(_: String.UnicodeScalarIndex, within: String)
  init?(_: String.UTF16Index, within: String)
  init?(_: String.UTF8Index, within: String)
}

public extension String.UTF16View.Index {
  init?(_: String.UTF8Index, within: String.UTF16View)
  init(_: String.UnicodeScalarIndex, within: String.UTF16View)
  init(_: String.Index, within: String.UTF16View)
}

public extension String.UTF8View.Index {
  init?(_: String.UTF16Index, within: String.UTF8View)
  init(_: String.UnicodeScalarIndex, within: String.UTF8View)
  init(_: String.Index, within: String.UTF8View)
}

public extension String.UnicodeScalarView.Index {
  init?(_: String.UTF16Index, within: String.UnicodeScalarView)
  init?(_: String.UTF8Index, within: String.UnicodeScalarView)
  init(_: String.Index, within: String.UnicodeScalarView)
}
```

These initializers are supplemented by a corresponding set of
convenience conversion methods:

```swift
if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
  ... 
}
```

with the following API:

```swift
public extension String.Index {
  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
  func samePosition(
    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
}

public extension String.UTF16View.Index {
  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
  func samePosition(
    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UTF8View.Index {
  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
  func samePosition(
    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UnicodeScalarView.Index {
  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
}
```

The result is a great deal of API surface area for apparently little
gain in ordinary code, that normally only interchanges indices among
views when the positions match up exactly (i.e. when the conversion is
going to succeed).  Also, the resulting code is needlessly awkward.

Finally, the opacity of these index types makes it difficult to record
`String` or `Substring` positions in files or other archival forms,
and reconstruct the original positions with respect to a deserialized
`String` or `Substring`.

## Proposed solution

All `String` views will use a single index type (`String.Index`), so
that positions can be interchanged without awkward explicit
conversions:

```swift
let html: String = "See <a href=\"http://swift.org\">swift.org</a>"

// Search the UTF16, instead of characters, for performance reasons:
let open = "<".utf16.first!, close = ">".utf16.first!
let tagStart = s.utf16.index(of: open)
let tagEnd = s.utf16[tagStart...].index(of: close)

// Slice the String with the UTF-16 indices to retrieve the tag.
let tag = html[tagStart...tagEnd]
```

A property and an intializer will be added to `String.Index`, exposing
the offset of the index in code units (currently only UTF-16) from the
beginning of the string:

```swift
let n: Int = html.endIndex.encodedOffset
let end = String.Index(encodedOffset: n)
assert(end == String.endIndex)
```

# Comparison and Slicing Semantics

When two indices being compared correspond to positions that are valid
in any single `String` view, comparison semantics are already fully
specified by the `Collection` requirements.  Where no single `String`
view contains both index values, the indices compare unequal and
ordering is determined by comparison of `encodedOffsets`.  These index
values are not totally ordered but do satisfy strict weak ordering
requirements, which is sufficient for algorithms such as `sort` to
exhibit sensible behavior.  We might consider loosening the specified
requirements on these algorithms and on `Comparable` to support strict
weak ordering, but for now we can treat such index pairs as being
outside the domain of comparison, like any other indices from
completely distinct collections.

An index that does not fall on an exact boundary in a given `String`
or `Substring` view will be “rounded down” to the nearest boundary
when used for slicing or range replacement.  So, for example,

```swift
let s = "e\u{301}galite\u{301}"                          // "égalité"
print(s[s.unicodeScalars.indices.dropFirst().first!...]) // "égalité"
print(s[..<s.unicodeScalars.indices.last!])              // "égalit"
```

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal.  For now, we do not
propose to remove the existing index conversion APIs.

## Detailed design

`String.Index` acquires an `encodedOffset` property and initializer:

```swift
public extension String.Index {
  /// Creates a position corresponding to the given offset in a
  /// `String`'s underlying (UTF-16) code units.
  init(encodedOffset: Int)

  /// The position of this index expressed as an offset from the
  /// beginning of the `String`'s underlying (UTF-16) code units.
  var encodedOffset: Int
}
```

`Index` types of `String.UTF8View`, `String.UTF16View`, and
`String.UnicodeScalarView` are replaced by `String.Index`:

```swift
public extension String.UTF8View {
  typealias Index = String.Index
}
public extension String.UTF16View {
  typealias Index = String.Index
}
public extension String.UnicodeScalarView {
  typealias Index = String.Index
}
```

Because the index types are collapsing, index conversion methods and
initializers are reduced to the following:

```swift
public extension String.Index {
  init?(_: String.Index, within: String)
  init?(_: String.Index, within: String.UTF8View)
  init?(_: String.Index, within: String.UTF16View)
  init?(_: String.Index, within: String.UnicodeScalarView)

  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF8View) -> String.Index?
  func samePosition(in: String.UTF16View) -> String.Index?
  func samePosition(in: String.UnicodeScalarView) -> String.Index?
}
```

## Source compatibility

Because of the collapse of index
types, [existing non-failable APIs](#motivation) become failable.  To
avoid breaking Swift 3 code, the following overloads of existing
functions are added, allowing the resulting optional indices to be
used where previously non-optional indices were used.  These overloads
were driven by making the new APIs work with existing code, including
the Swift source compatibility test suite, and should be viewed as
migration aids only, rather than additions to the Swift 3 API.

```swift
extension Optional where Wrapped == String.Index {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public static func ..<(
    lhs: String.Index?, rhs: String.Index?
  ) -> Range<String.Index> {
    return lhs! ..< rhs!
  }

  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public static func ...(
    lhs: String.Index?, rhs: String.Index?
  ) -> ClosedRange<String.Index> {
    return lhs! ... rhs!
  }
}

// backward compatibility for index interchange.  
extension String.UTF16View {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(after i: Index?) -> Index {
    return index(after: i)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(
    _ i: Index?, offsetBy n: IndexDistance) -> Index {
    return index(i!, offsetBy: n)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
    return distance(from: i!, to: j!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
    return self[i!]
  }
}

extension String.UTF8View {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(after i: Index?) -> Index {
    return index(after: i!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
    return index(i!, offsetBy: n)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public func distance(
    from i: Index?, to j: Index?) -> IndexDistance {
    return distance(from: i!, to: j!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
    return self[i!]
  }
}

// backward compatibility for index interchange.  
extension String.UnicodeScalarView {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(after i: Index?) -> Index {
    return index(after: i)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(_ i: Index?,  offsetBy n: IndexDistance) -> Index {
    return index(i!, offsetBy: n)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
    return distance(from: i!, to: j!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public subscript(i: Index?) -> Unicode.Scalar {
    return self[i!]
  }
}
```

- **Q**: Will existing correct Swift 3 applications stop compiling due
  to this change?

  **A**: it is possible but unlikely.  The existing index conversion
  APIs are relatively rarely used, and the overloads listed above
  handle the common cases in Swift 3 compatibility mode.
  
- **Q**: Will applications still compile but produce
  different behavior than they used to? 

  **A**: No.
  
- **Q**: Is it possible to automatically migrate from the old syntax
  to the new syntax? 

  **A**: Yes, although usages of these APIs may be rare enough that it
  isn't worth the trouble.

- **Q**: Can Swift applications be written in a common subset that works
   both with Swift 3 and Swift 4 to aid in migration?

  **A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.

## Effect on ABI stability

This proposal changes the ABI of the standard library.

## Effect on API resilience

This proposal makes no changes to the resilience of any APIs.

## Alternatives considered

The only alternative considered was no action.


-- 
-Dave



More information about the swift-evolution mailing list