[swift-users] Unexpected results when using String.CharacterView.Index

Zhao Xin owenzx at gmail.com
Thu Mar 9 06:31:00 CST 2017


Thanks a lot, Ole. I understand now.

Zhaoxin

On Thu, Mar 9, 2017 at 7:54 PM, Ole Begemann <ole at oleb.net> wrote:

> On 09/03/2017 08:27, Zhao Xin via swift-users wrote:
>
>> When using subscript of `String.CharacterView`, I got an unexpected error.
>>
>>     fatal error: Can't form a Character from an empty String
>>
>> func test() {
>>     let s = "Original Script:"
>>     let cs = s.characters
>> //    let startIndex = cs.startIndex
>>     let nextIndex = "Original ?".characters.endIndex
>>     let nextCharacter = cs[nextIndex]// above error
>> }
>>
>> test()
>>
>
> First of all, it's not guaranteed that an index derived from one string
> can be used to subscript another string. Don't rely on that.
>
> endIndex is also different, and this is why you're seeing a crash here.
> Let's inspect nextIndex with dump(nextIndex):
>
> ▿ Swift.String.CharacterView.Index
>   ▿ _base: Swift.String.UnicodeScalarView.Index
>     - _position: 10
>   - _countUTF16: 0
>
> You see that _countUTF16 is 0, i.e. internally, String.CharacterView
> assigns its endIndex a length of 0 (in terms of UTF-16 code units). This is
> why it traps when you use the index for subscripting. The endIndex is not a
> valid index for subscripting, not for the string it was derived from and
> not for any other string.
>
> ​However, if I chose​ another way to get the nextIndex. It works.
>>
>> functest() {
>>     let s = "Original Script:"
>>     let cs = s.characters
>>     let startIndex = cs.startIndex
>> //    let nextIndex = "Original ?".characters.endIndex
>>     let nextIndex01 = cs.index(startIndex, offsetBy: "Original
>> ?".characters.count)
>>     let nextCharacter = cs[nextIndex01]
>> }
>>
>> test()
>>
>
> Here, dump(nextIndex01) prints this:
>
> ▿ Swift.String.CharacterView.Index
>   ▿ _base: Swift.String.UnicodeScalarView.Index
>     - _position: 10
>   - _countUTF16: 1
>
> Notice that _countUTF16 is 1, so it looks like a valid index from the
> perspective of cs. But again, don't rely on this! The results of
> subscripting a collection with an index derived from another collection are
> undefined unless the collection explicitly documents otherwise.
>
> Further more, I compared the two `nextIndex`. They were equal.
>>
>> functest() {
>>     let s = "Original Script:"
>>     let cs = s.characters
>>     let startIndex = cs.startIndex
>>     let nextIndex = "Original ?".characters.endIndex
>>     let nextIndex01 = cs.index(startIndex, offsetBy: "Original
>> ?".characters.count)
>>     let nextCharacter = cs[nextIndex01]
>>     print(nextIndex01 == nextIndex) // true
>> }
>>
>> test()
>>
>
> It looks like String.Index only takes the position into account to
> determine equality, not its _countUTF16. This makes sense for the way
> endIndex and index(_:offsetBy:) are implemented. After all, nextIndex and
> nextIndex01 _should be equal_. It would certainly be possible to implement
> it differently (where endIndex and index(_:offsetBy:) returned identical
> indices, including _countUTF16:) and I don't know why the stdlib team chose
> to do it this way (maybe performance?).
>
> In any case, much of this implementation may change with the work going
> into strings for Swift 4.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20170309/d0e795b9/attachment.html>


More information about the swift-users mailing list