[swift-dev] Make offset index available for String

Cao, Jiannan frogcjn at 163.com
Tue Dec 19 01:37:51 CST 2017


I implemented the second approach: SuperIndex

https://github.com/frogcjn/SuperStringIndex/ <https://github.com/frogcjn/SuperStringIndex/>

SuperString is a special version of String. Its SuperIndex keeps a reference to the string, let the index calculate the offset.


struct SuperIndex : Comparable, Strideable, CustomStringConvertible {
    
    var owner: Substring
    var wrapped: String.Index
   
	...

    // Offset
    var offset: Int {
        return owner.distance(from: owner.startIndex, to: wrapped)
    }

    // Strideable
    func advanced(by n: SuperIndex.Stride) -> SuperIndex {
        return SuperIndex(owner.index(wrapped, offsetBy: n), owner)
    }

    static  func +(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex {
        return lhs.advanced(by: rhs)
    }
}

let a: SuperString = "01234"
let o = a.startIndex
let o1 = o + 4
print(a[o]) // 0
print(a[...]) // 01234
print(a[..<(o+2)]) // 01
print(a[...(o+2)]) // 012
print(a[(o+2)...]) // 234
print(a[o+2..<o+3]) // 2
print(a[o1-2...o1-1]) // 23

if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

if let number = a.index(where: { $0 > "1" }) {
    print(number) // 2
}

let b = a[(o+1)...]
let z = b.startIndex
let z1 = z + 4
print(b[z]) // 1
print(b[...]) // 1234
print(b[..<(z+2)]) // 12
print(b[...(z+2)]) // 123
print(b[(z+2)...]) // 34
print(b[z+2...z+3]) // 34
print(b[z1-2...z1-2]) // 3


> 在 2017年12月18日,下午4:53,Cao, Jiannan <frogcjn at 163.com> 写道:
> 
> Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
> When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.
> 
> let startIndex = s.startIndex
> s[startIndex+1]
> 
> public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
>     public let owner: T
> ...
>     public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
>         return lhs.owner.index(lhs, offsetBy: rhs)
>     }
> }
> 
>> 在 2017年12月15日,上午9:34,Michael Ilseman <milseman at apple.com <mailto:milseman at apple.com>> 写道:
>> 
>> Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.
>> 
>> More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.
>> 
>> Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.
>> 
>> 
>>> On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn at 163.com <mailto:frogcjn at 163.com>> wrote:
>>> 
>>> This offset is unicode offset, is not the offset of element. 
>>> For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.
>>> 
>>> Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
>>> The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.
>>> 
>>> The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).
>>> 
>>> These two offset is totally different.
>>> 
>>> Best,
>>> Jiannan
>>> 
>>>> 在 2017年12月15日,上午9:17,Michael Ilseman <milseman at apple.com <mailto:milseman at apple.com>> 写道:
>>>> 
>>>> 
>>>> 
>>>>> On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>> 
>>>>> People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.
>>>>> 
>>>> 
>>>> The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:
>>>> 
>>>> init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int>)
>>>> 
>>>> and 
>>>> 
>>>> var encodedOffset: Int <https://developer.apple.com/documentation/swift/int> { get }
>>>> 
>>>> 
>>>> [1] https://developer.apple.com/documentation/swift/string.index <https://developer.apple.com/documentation/swift/string.index>
>>>> 
>>>> 
>>>>> This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.
>>>>> 
>>>>> People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).
>>>>> 
>>>>> Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.
>>>>> 
>>>>> The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not. 
>>>>> 
>>>>> We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i) 
>>>>> 
>>>>> We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable- <https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable->).
>>>>> 
>>>>> or we could make it at compile time:
>>>>> for example
>>>>> 
>>>>> 	c[1...]
>>>>> compile to
>>>>> 	c[c.index(startIndex, offsetBy:1)...]
>>>>> 
>>>>> 	let index: Int = s.index(of: "a")
>>>>> compile to
>>>>> 	let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))
>>>>> 
>>>>> 	let index = 1 // if used in s only
>>>>> 	s[index..<index+2]
>>>>> compile to
>>>>> 	let index = s.index(s.startIndex, offsetBy: 1)
>>>>> 	s[index..<s.index(index, offsetBy: 2)]
>>>>> 
>>>>> 	let index = 1 // if used both in s1, s2
>>>>> 	s1[index..<index+2]
>>>>> 	s2[index..<index+2]
>>>>> compile to
>>>>> 	let index = 1
>>>>> 	let index1 = s1.index(s.startIndex, offsetBy: index)
>>>>> 	let index2 = s2.index(s.startIndex, offsetBy: index)
>>>>> 	s1[index1..<s.index(index1, offsetBy: 2)]
>>>>> 	s2[index2..<s.index(index2, offsetBy: 2)]
>>>>> 
>>>>> I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.
>>>>> 
>>>>> Thanks!
>>>>> Jiannan
>>>>> 
>>>>>> 在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose at apple.com <mailto:jordan_rose at apple.com>> 写道:
>>>>>> 
>>>>>> We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.
>>>>>> 
>>>>>> I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.
>>>>>> 
>>>>>> Jordan
>>>>>> 
>>>>>> 
>>>>>>> On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.
>>>>>>> 
>>>>>>> But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?
>>>>>>> 
>>>>>>> for example:
>>>>>>> let a = "01234"
>>>>>>> print(a[0]) // 0
>>>>>>> print(a[0...4]) // 01234
>>>>>>> print(a[...]) // 01234
>>>>>>> print(a[..<2]) // 01
>>>>>>> print(a[...2]) // 012
>>>>>>> print(a[2...]) // 234
>>>>>>> print(a[2...3]) // 23
>>>>>>> print(a[2...2]) // 2
>>>>>>> if let number = a.index(of: "1") {
>>>>>>>     print(number) // 1
>>>>>>>     print(a[number...]) // 1234
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> 0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
>>>>>>> 1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
>>>>>>> ...
>>>>>>> we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
>>>>>>> Any Collection could use the offset index to access their element, regarding the real index of it.
>>>>>>> 
>>>>>>> I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.
>>>>>>> 
>>>>>>> https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable- <https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable->
>>>>>>> 
>>>>>>> If someone want to make the offset index/range available for any collection, you just need to extend the collection:
>>>>>>> extension String : OffsetIndexableCollection {
>>>>>>> }
>>>>>>> 
>>>>>>> extension Substring : OffsetIndexableCollection {
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> Jiannan
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> swift-dev mailing list
>>>>>>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> swift-dev mailing list
>>>>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>>>>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20171219/16bdda21/attachment.html>


More information about the swift-dev mailing list