[swift-dev] Make offset index available for String

Ole Begemann ole at oleb.net
Wed Jan 3 10:40:23 CST 2018


On 03.01.18 01:19, Karl Wagner via swift-dev wrote:
> Swift used to do this, but we switched it around so indexes couldn’t 
> self-increment.
>
> One of the problems was that strings are value-types. So you would get 
> an index, then append stuff to the string, but when you tried to 
> advance the index again it would blow up. The index retained the 
> backing, which means the “append” caused a copy, and the index was 
> suddenly pointing to a different String backing.
>
> Basically, self-incrementing indexes require that the Collection has 
> reference semantics. Otherwise there simply is no concept of an 
> independent “owning” Collection which your Index can hold a reference to.
>
> Anyway, that doesn’t mean you’re wrong. Collection-slicing syntax is 
> still way too ugly. We need to keep it safe, and communicative, but it 
> should also be obvious and not tiring.
>
> Currently, you have to write:
>
>     <collection>[<collection>.index(<collection>.<member>, offsetBy:
>     <distance>)]
>
> And an example...
>
>
>     results[results.index(results.startIndex, offsetBy: 3)]
>
>
> Which is safe, and communicative, and obvious, but also really, really 
> tiring. There are ways we can make it less tiring without sacrificing 
> the good parts:
>
> 1) Add a version of index(_: offsetBy:) which takes a KeyPath<Self, 
> Self.Index> as its first argument. That’s a minor convenience you can 
> add today in your own projects. It removes one repetition of 
> <collection>, in many common cases.
>
>     extension Collection {
>       func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance)
>     -> Index {
>       return index(self[keyPath: i], offsetBy: n)
>       }
>       func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance,
>     limitedBy: Index) -> Index? {
>       return index(self[keyPath: i], offsetBy: n, limitedBy: limitedBy)
>       }
>     }
>
>
>     results[results.index(\.startIndex, offsetBy: 3)]
>
> Seriously, man, KeyPaths are just /the business/. I love them.
>
> 2) Bind <collection> to something like an anonymous closure argument 
> within the subscript. Or just allow “.” syntax, as for static members. 
> That removes another <collection>.
>
>     results[.index(\.startIndex, offsetBy: 3)]
>
>     or
>
>     results[$.index(\.startIndex, offsetBy: 3)]
>
>
> If anybody’s interested, I was playing around with an 
> “IndexExpression” type for this kind of thing. The language lets you 
> get pretty far, but it doesn’t work and I can’t figure out why. It 
> looks like a simple-enough generic struct, but it fails with a cyclic 
> metadata dependency.
>
> https://gist.github.com/karwa/04cc43431951f24ae9334ba8a25e6a31

I'm not 100% sure why, but moving the IndexType enum out of the 
IndexExpression struct makes it work (Xcode 9.2, Swift 4.0.3):

enum IndexType<C: Collection> {
     case keypath(KeyPath<C, C.Index>)
     case index(C.Index)
}

struct IndexExpression<C: Collection> {
     let base: IndexType<C>
     let distance: C.IndexDistance

     func resolve(in collection: C) -> C.Index {
         let baseIdx: C.Index
         switch base {
         case .index(let idx):  baseIdx = idx
         case .keypath(let kp): baseIdx = collection[keyPath: kp]
         }
         return collection.index(baseIdx, offsetBy: distance)
     }
}

let string = "hello everybody!"
let myIdx = string.startIndex
string[myIdx + 2] // "l"
string[\.endIndex - 3] // "d"

> - Karl
>
>> On 19. Dec 2017, at 08:38, Cao, Jiannan via swift-dev 
>> <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>
>> I implemented the second approach: SuperIndex
>>
>> https://github.com/frogcjn/SuperStringIndex/
>>
>> SuperString is a special version of String. Its SuperIndex keeps a 
>> reference to the string, let the index calculate the offset.
>>
>>
>> structSuperIndex : Comparable, Strideable, CustomStringConvertible{
>>
>> var owner: Substring
>> var wrapped: String.Index
>>
>>
>> ...
>>
>> // Offset
>> var offset: Int {
>> returnowner.distance(from: owner.startIndex, to: wrapped)
>>   }
>>
>>     // Strideable
>> funcadvanced(by n: SuperIndex.Stride) -> SuperIndex{
>> returnSuperIndex(owner.index(wrapped, offsetBy: n), owner)
>>     }
>>
>> staticfunc+(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex{
>> return lhs.advanced(by: rhs)
>>     }
>> }
>>
>> let  a:  SuperString=  "01234"
>> let  o=  a.startIndex
>> let  o1=  o+  4
>> print(a[o])// 0
>> print(a[...])// 01234
>> print(a[..<(o+2)])// 01
>> print(a[...(o+2)])// 012
>> print(a[(o+2)...])// 234
>> print(a[o+2..<o+3])// 2
>> print(a[o1-2...o1-1])// 23
>>
>> if  let  number=  a.index(of:"1") {
>>      print(number)// 1
>>      print(a[number...])// 1234
>> }
>>
>> if  let  number=  a.index(where: {$0  >  "1"  }) {
>>      print(number)// 2
>> }
>>
>> let  b=  a[(o+1)...]
>> let  z=  b.startIndex
>> let  z1=  z+  4
>> print(b[z])// 1
>> print(b[...])// 1234
>> print(b[..<(z+2)])// 12
>> print(b[...(z+2)])// 123
>> print(b[(z+2)...])// 34
>> print(b[z+2...z+3])// 34
>> print(b[z1-2...z1-2])// 3
>>
>>
>>> 在 2017年12月18日,下午4:53,Cao, Jiannan <frogcjn at 163.com 
>>> <mailto:frogcjn at 163.com>> 写道:
>>>
>>> Or we can copy the design of std::vector::iterator in C++.The index 
>>> could keep a reference to the collection.
>>> When the index being offset by + operator, it could call the owner 
>>> to offset the index, since it keeps a reference to the collection owner.
>>>
>>> let startIndex = s.startIndex
>>> s[startIndex+1]
>>>
>>>     publicstructMyIndex<T: Collection> : ComparablewhereT.Index==
>>>     MyIndex{
>>>     public let owner: T
>>>     ...
>>>     publicstaticfunc+ (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex{
>>>     return lhs.owner.index(lhs, offsetBy: rhs)
>>>         }
>>>     }
>>>
>>>
>>>> 在 2017年12月15日,上午9:34,Michael Ilseman <milseman at apple.com 
>>>> <mailto:milseman at apple.com>> 写道:
>>>>
>>>> Yes, I was trying to highlight that they are different and should 
>>>> be treated different. This was because it seemed you were 
>>>> conflating the two in your argument. You claim that people expect 
>>>> it, and I’m pointing out that what people actually expect (assuming 
>>>> they’re coming from C or languages with a similar model) already 
>>>> exists as those models deal in encoded offsets.
>>>>
>>>> More important than expectations surrounding what to provide to a 
>>>> subscript are expectations surrounding algorithmic complexity. This 
>>>> has security implications. The expectation of subscript is that it 
>>>> is “constant-ish”, for a fuzzy hand-wavy definition of 
>>>> “constant-ish” which includes amortized constant or logarithmic.
>>>>
>>>> Now, I agree with the overall sentiment that `index(offsetBy:)` is 
>>>> unwieldy. I am interested in approaches to improve this. But, we 
>>>> cannot throw linear complexity into subscript without extreme 
>>>> justification.
>>>>
>>>>
>>>>> On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn at 163.com 
>>>>> <mailto:frogcjn at 163.com>> wrote:
>>>>>
>>>>> This offset is unicode offset, is not the offset of element.
>>>>> For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 
>>>>> 8, not 1.
>>>>>
>>>>> Offset indexable is based on the offset of count of each 
>>>>> element/index. it is the same result of s.index(s.startIndex, 
>>>>> offsetBy:i)
>>>>> The encodedOffset is the underlaying offset of unicode string, not 
>>>>> the same concept of the offset index of collection.
>>>>>
>>>>> The offset indexable is meaning to the elements and index of 
>>>>> collection (i-th element of the collection), not related to the 
>>>>> unicode offset (which is the underlaying data offset meaning to 
>>>>> the UTF-16 String).
>>>>>
>>>>> These two offset is totally different.
>>>>>
>>>>> Best,
>>>>> Jiannan
>>>>>
>>>>>> 在 2017年12月15日,上午9:17,Michael Ilseman <milseman at apple.com 
>>>>>> <mailto:milseman at apple.com>> 写道:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev 
>>>>>>> <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>
>>>>>>> People used to the offset index system instead of the 
>>>>>>> String.Index. Using offset indices to name the elements, count 
>>>>>>> the elements is normal and nature.
>>>>>>>
>>>>>>
>>>>>> The offset system that you’re referring to is totally available 
>>>>>> in String today, if you’re willing for it to be the offset into 
>>>>>> the encoding. That’s the offset “people” you’re referring to are 
>>>>>> likely used to and consider normal and natural. On String.Index, 
>>>>>> there is the following:
>>>>>>
>>>>>> |init(encodedOffset offset: Int 
>>>>>> <https://developer.apple.com/documentation/swift/int>)|
>>>>>>
>>>>>> and
>>>>>>
>>>>>> |var encodedOffset: Int 
>>>>>> <https://developer.apple.com/documentation/swift/int> { get }|
>>>>>>
>>>>>>
>>>>>> [1] https://developer.apple.com/documentation/swift/string.index
>>>>>>
>>>>>>
>>>>>>> *This offset index system has a long history and a real meaning 
>>>>>>> to the collection. *The subscript s[i] has a fix meaning of 
>>>>>>> "getting the i-th element in this collection", which is normal 
>>>>>>> and direct. Get the range with offset indices, is also direct. 
>>>>>>> It means the substring is from the i-th character up to the j-th 
>>>>>>> character of the original string.
>>>>>>>
>>>>>>> People used to play subscript, range with offset indices. Use 
>>>>>>> string[string.index(i, offsetBy: 5)] is not as directly and 
>>>>>>> easily as string[i + 5]. Also the Range<String.Index> is not as 
>>>>>>> directly as Range<Offset>. Developers need to transfer the 
>>>>>>> Range<String.Index> result of string.range(of:) to 
>>>>>>> Range<OffsetIndex> to know the exact range of the substring. 
>>>>>>> Range<String.Index> has a real meaning to the machine and 
>>>>>>> underlaying data location for the substring, but 
>>>>>>> Range<OffsetIndex> also has a direct location information for 
>>>>>>> human being, and represents*the abstract location concept of the 
>>>>>>> collection (This is the most UNIMPEACHABLE REASON I could provide)*.
>>>>>>> *
>>>>>>> *
>>>>>>> *Offset index system is based on the nature of collection. Each 
>>>>>>> element of the collection could be located by offset, which is a 
>>>>>>> direct and simple conception to any collection. Right? *Even the 
>>>>>>> String with String.Index has some offset index property within 
>>>>>>> it. For example: the `count` of the String, is the offset index 
>>>>>>> of the endIndex.The enumerated() generated a sequence with 
>>>>>>> elements contains the same offset as the offset index system 
>>>>>>> provided. And when we apply Array(string), the string divided by 
>>>>>>> each character and make the offset indices available for the new 
>>>>>>> array.
>>>>>>>
>>>>>>> *The offset index system is just an assistant for collection, 
>>>>>>> not a replacement to String.Index. *We use String.Index to 
>>>>>>> represent the normal underlaying of the String. We also could 
>>>>>>> use offset indices to represent the nature of the Collection 
>>>>>>> with its elements. Providing the offset index as a second choice 
>>>>>>> to access elements in collections, is not only for the String 
>>>>>>> struct, is for all collections, since *it is the nature of the 
>>>>>>> collection concept*, and developer could choose use it or not.**
>>>>>>>
>>>>>>> We don't make the String.Index O(1), but translate the offset 
>>>>>>> indices to the underlaying String.Index. Each time using 
>>>>>>> subscript with offset index, we just need to translate offset 
>>>>>>> indices to underlaying indices using c.index(startIndex, 
>>>>>>> offsetBy:i), c.distance(from: startIndex, to:i)
>>>>>>>
>>>>>>> We can make the offset indices available through extension to 
>>>>>>> Collection (as my GitHub repo demo: 
>>>>>>> https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-).
>>>>>>>
>>>>>>> or we could make it at compile time:
>>>>>>> for example
>>>>>>>
>>>>>>> c[1...]
>>>>>>> compile to
>>>>>>> c[c.index(startIndex, offsetBy:1)...]
>>>>>>>
>>>>>>> let index: Int = s.index(of: "a")
>>>>>>> compile to
>>>>>>> let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))
>>>>>>>
>>>>>>> let index = 1 // if used in s only
>>>>>>> s[index..<index+2]
>>>>>>> compile to
>>>>>>> let index = s.index(s.startIndex, offsetBy: 1)
>>>>>>> s[index..<s.index(index, offsetBy: 2)]
>>>>>>>
>>>>>>> let index = 1 // if used both in s1, s2
>>>>>>> s1[index..<index+2]
>>>>>>> s2[index..<index+2]
>>>>>>> compile to
>>>>>>> let index = 1
>>>>>>> let index1 = s1.index(s.startIndex, offsetBy: index)
>>>>>>> let index2 = s2.index(s.startIndex, offsetBy: index)
>>>>>>> s1[index1..<s.index(index1, offsetBy: 2)]
>>>>>>> s2[index2..<s.index(index2, offsetBy: 2)]
>>>>>>>
>>>>>>> I really want the team to consider providing the offset index 
>>>>>>> system as an assistant to the collection. It is the very 
>>>>>>> necessary basic concept of Collection.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jiannan
>>>>>>>
>>>>>>>> 在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose at apple.com 
>>>>>>>> <mailto:jordan_rose at apple.com>> 写道:
>>>>>>>>
>>>>>>>> We really don't want to make subscripting a non-O(1) operation. 
>>>>>>>> That just provides false convenience and encourages people to 
>>>>>>>> do the wrong thing with Strings anyway.
>>>>>>>>
>>>>>>>> I'm always interested in why people want this kind of ability. 
>>>>>>>> Yes, it's nice for teaching programming to be able to split 
>>>>>>>> strings on character boundaries indexed by integers, but where 
>>>>>>>> does it come up in real life? The most common cases I see are 
>>>>>>>> trying to strip off the first or last character, or a known 
>>>>>>>> prefix or suffix, and I feel like we should have better answers 
>>>>>>>> for those than "use integer indexes" anyway.
>>>>>>>>
>>>>>>>> Jordan
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev 
>>>>>>>>> <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I would like to discuss the String.Index problem within Swift. 
>>>>>>>>> I know the current situation of String.Index is based on the 
>>>>>>>>> nature of the underlaying data structure of the string.
>>>>>>>>>
>>>>>>>>> But could we just make String.Index contain offset 
>>>>>>>>> information? Or make offset index subscript available for 
>>>>>>>>> accessing character in String?
>>>>>>>>>
>>>>>>>>> for example:
>>>>>>>>>
>>>>>>>>>     leta = "01234"
>>>>>>>>>     print(a[0]) // 0
>>>>>>>>>     print(a[0...4]) // 01234
>>>>>>>>>     print(a[...]) // 01234
>>>>>>>>>     print(a[..<2]) // 01
>>>>>>>>>     print(a[...2]) // 012
>>>>>>>>>     print(a[2...]) // 234
>>>>>>>>>     print(a[2...3]) // 23
>>>>>>>>>     print(a[2...2]) // 2
>>>>>>>>>     ifletnumber = a.index(of: "1") {
>>>>>>>>>     print(number) // 1
>>>>>>>>>     print(a[number...]) // 1234
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 0 equals to Collection.Index of collection.index(startIndex, 
>>>>>>>>> offsetBy: 0)
>>>>>>>>> 1 equals to Collection.Index of collection.index(startIndex, 
>>>>>>>>> offsetBy: 1)
>>>>>>>>> ...
>>>>>>>>> we keep the String.Index, but allow another kind of index, 
>>>>>>>>> which is called "offsetIndex" to access the String.Index and 
>>>>>>>>> the character in the string.
>>>>>>>>> Any Collection could use the offset index to access their 
>>>>>>>>> element, regarding the real index of it.
>>>>>>>>>
>>>>>>>>> I have make the Collection OffsetIndexable protocol available 
>>>>>>>>> here, and make it more accessible for StringProtocol 
>>>>>>>>> considering all API related to the index.
>>>>>>>>>
>>>>>>>>> https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-
>>>>>>>>>
>>>>>>>>> If someone want to make the offset index/range available for 
>>>>>>>>> any collection, you just need to extend the collection:
>>>>>>>>> extension  String  :OffsetIndexableCollection {
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> extension  Substring  :OffsetIndexableCollection {
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I hope the Swift core team could consider bring the offset 
>>>>>>>>> index to string, or make it available to other collection, 
>>>>>>>>> thus let developer to decide whether their collection could 
>>>>>>>>> use offset indices as an assistant for the real index of the 
>>>>>>>>> collection.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Jiannan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20180103/4d92d8b8/attachment.html>


More information about the swift-dev mailing list