[swift-users] Unexpected results when using String.CharacterView.Index
Ole Begemann
ole at oleb.net
Thu Mar 9 05:54:11 CST 2017
On 09/03/2017 08:27, Zhao Xin via swift-users wrote:
> When using subscript of `String.CharacterView`, I got an unexpected error.
>
> fatal error: Can't form a Character from an empty String
>
> func test() {
> let s = "Original Script:"
> let cs = s.characters
> // let startIndex = cs.startIndex
> let nextIndex = "Original ?".characters.endIndex
> let nextCharacter = cs[nextIndex]// above error
> }
>
> test()
First of all, it's not guaranteed that an index derived from one string
can be used to subscript another string. Don't rely on that.
endIndex is also different, and this is why you're seeing a crash here.
Let's inspect nextIndex with dump(nextIndex):
▿ Swift.String.CharacterView.Index
▿ _base: Swift.String.UnicodeScalarView.Index
- _position: 10
- _countUTF16: 0
You see that _countUTF16 is 0, i.e. internally, String.CharacterView
assigns its endIndex a length of 0 (in terms of UTF-16 code units). This
is why it traps when you use the index for subscripting. The endIndex is
not a valid index for subscripting, not for the string it was derived
from and not for any other string.
> However, if I chose another way to get the nextIndex. It works.
>
> functest() {
> let s = "Original Script:"
> let cs = s.characters
> let startIndex = cs.startIndex
> // let nextIndex = "Original ?".characters.endIndex
> let nextIndex01 = cs.index(startIndex, offsetBy: "Original
> ?".characters.count)
> let nextCharacter = cs[nextIndex01]
> }
>
> test()
Here, dump(nextIndex01) prints this:
▿ Swift.String.CharacterView.Index
▿ _base: Swift.String.UnicodeScalarView.Index
- _position: 10
- _countUTF16: 1
Notice that _countUTF16 is 1, so it looks like a valid index from the
perspective of cs. But again, don't rely on this! The results of
subscripting a collection with an index derived from another collection
are undefined unless the collection explicitly documents otherwise.
> Further more, I compared the two `nextIndex`. They were equal.
>
> functest() {
> let s = "Original Script:"
> let cs = s.characters
> let startIndex = cs.startIndex
> let nextIndex = "Original ?".characters.endIndex
> let nextIndex01 = cs.index(startIndex, offsetBy: "Original
> ?".characters.count)
> let nextCharacter = cs[nextIndex01]
> print(nextIndex01 == nextIndex) // true
> }
>
> test()
It looks like String.Index only takes the position into account to
determine equality, not its _countUTF16. This makes sense for the way
endIndex and index(_:offsetBy:) are implemented. After all, nextIndex
and nextIndex01 _should be equal_. It would certainly be possible to
implement it differently (where endIndex and index(_:offsetBy:) returned
identical indices, including _countUTF16:) and I don't know why the
stdlib team chose to do it this way (maybe performance?).
In any case, much of this implementation may change with the work going
into strings for Swift 4.
More information about the swift-users
mailing list