[swift-dev] Make offset index available for String
Ole Begemann
ole at oleb.net
Wed Jan 3 10:40:23 CST 2018
On 03.01.18 01:19, Karl Wagner via swift-dev wrote:
> Swift used to do this, but we switched it around so indexes couldn’t
> self-increment.
>
> One of the problems was that strings are value-types. So you would get
> an index, then append stuff to the string, but when you tried to
> advance the index again it would blow up. The index retained the
> backing, which means the “append” caused a copy, and the index was
> suddenly pointing to a different String backing.
>
> Basically, self-incrementing indexes require that the Collection has
> reference semantics. Otherwise there simply is no concept of an
> independent “owning” Collection which your Index can hold a reference to.
>
> Anyway, that doesn’t mean you’re wrong. Collection-slicing syntax is
> still way too ugly. We need to keep it safe, and communicative, but it
> should also be obvious and not tiring.
>
> Currently, you have to write:
>
> <collection>[<collection>.index(<collection>.<member>, offsetBy:
> <distance>)]
>
> And an example...
>
>
> results[results.index(results.startIndex, offsetBy: 3)]
>
>
> Which is safe, and communicative, and obvious, but also really, really
> tiring. There are ways we can make it less tiring without sacrificing
> the good parts:
>
> 1) Add a version of index(_: offsetBy:) which takes a KeyPath<Self,
> Self.Index> as its first argument. That’s a minor convenience you can
> add today in your own projects. It removes one repetition of
> <collection>, in many common cases.
>
> extension Collection {
> func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance)
> -> Index {
> return index(self[keyPath: i], offsetBy: n)
> }
> func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance,
> limitedBy: Index) -> Index? {
> return index(self[keyPath: i], offsetBy: n, limitedBy: limitedBy)
> }
> }
>
>
> results[results.index(\.startIndex, offsetBy: 3)]
>
> Seriously, man, KeyPaths are just /the business/. I love them.
>
> 2) Bind <collection> to something like an anonymous closure argument
> within the subscript. Or just allow “.” syntax, as for static members.
> That removes another <collection>.
>
> results[.index(\.startIndex, offsetBy: 3)]
>
> or
>
> results[$.index(\.startIndex, offsetBy: 3)]
>
>
> If anybody’s interested, I was playing around with an
> “IndexExpression” type for this kind of thing. The language lets you
> get pretty far, but it doesn’t work and I can’t figure out why. It
> looks like a simple-enough generic struct, but it fails with a cyclic
> metadata dependency.
>
> https://gist.github.com/karwa/04cc43431951f24ae9334ba8a25e6a31
I'm not 100% sure why, but moving the IndexType enum out of the
IndexExpression struct makes it work (Xcode 9.2, Swift 4.0.3):
enum IndexType<C: Collection> {
case keypath(KeyPath<C, C.Index>)
case index(C.Index)
}
struct IndexExpression<C: Collection> {
let base: IndexType<C>
let distance: C.IndexDistance
func resolve(in collection: C) -> C.Index {
let baseIdx: C.Index
switch base {
case .index(let idx): baseIdx = idx
case .keypath(let kp): baseIdx = collection[keyPath: kp]
}
return collection.index(baseIdx, offsetBy: distance)
}
}
let string = "hello everybody!"
let myIdx = string.startIndex
string[myIdx + 2] // "l"
string[\.endIndex - 3] // "d"
> - Karl
>
>> On 19. Dec 2017, at 08:38, Cao, Jiannan via swift-dev
>> <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>
>> I implemented the second approach: SuperIndex
>>
>> https://github.com/frogcjn/SuperStringIndex/
>>
>> SuperString is a special version of String. Its SuperIndex keeps a
>> reference to the string, let the index calculate the offset.
>>
>>
>> structSuperIndex : Comparable, Strideable, CustomStringConvertible{
>>
>> var owner: Substring
>> var wrapped: String.Index
>>
>>
>> ...
>>
>> // Offset
>> var offset: Int {
>> returnowner.distance(from: owner.startIndex, to: wrapped)
>> }
>>
>> // Strideable
>> funcadvanced(by n: SuperIndex.Stride) -> SuperIndex{
>> returnSuperIndex(owner.index(wrapped, offsetBy: n), owner)
>> }
>>
>> staticfunc+(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex{
>> return lhs.advanced(by: rhs)
>> }
>> }
>>
>> let a: SuperString= "01234"
>> let o= a.startIndex
>> let o1= o+ 4
>> print(a[o])// 0
>> print(a[...])// 01234
>> print(a[..<(o+2)])// 01
>> print(a[...(o+2)])// 012
>> print(a[(o+2)...])// 234
>> print(a[o+2..<o+3])// 2
>> print(a[o1-2...o1-1])// 23
>>
>> if let number= a.index(of:"1") {
>> print(number)// 1
>> print(a[number...])// 1234
>> }
>>
>> if let number= a.index(where: {$0 > "1" }) {
>> print(number)// 2
>> }
>>
>> let b= a[(o+1)...]
>> let z= b.startIndex
>> let z1= z+ 4
>> print(b[z])// 1
>> print(b[...])// 1234
>> print(b[..<(z+2)])// 12
>> print(b[...(z+2)])// 123
>> print(b[(z+2)...])// 34
>> print(b[z+2...z+3])// 34
>> print(b[z1-2...z1-2])// 3
>>
>>
>>> 在 2017年12月18日,下午4:53,Cao, Jiannan <frogcjn at 163.com
>>> <mailto:frogcjn at 163.com>> 写道:
>>>
>>> Or we can copy the design of std::vector::iterator in C++.The index
>>> could keep a reference to the collection.
>>> When the index being offset by + operator, it could call the owner
>>> to offset the index, since it keeps a reference to the collection owner.
>>>
>>> let startIndex = s.startIndex
>>> s[startIndex+1]
>>>
>>> publicstructMyIndex<T: Collection> : ComparablewhereT.Index==
>>> MyIndex{
>>> public let owner: T
>>> ...
>>> publicstaticfunc+ (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex{
>>> return lhs.owner.index(lhs, offsetBy: rhs)
>>> }
>>> }
>>>
>>>
>>>> 在 2017年12月15日,上午9:34,Michael Ilseman <milseman at apple.com
>>>> <mailto:milseman at apple.com>> 写道:
>>>>
>>>> Yes, I was trying to highlight that they are different and should
>>>> be treated different. This was because it seemed you were
>>>> conflating the two in your argument. You claim that people expect
>>>> it, and I’m pointing out that what people actually expect (assuming
>>>> they’re coming from C or languages with a similar model) already
>>>> exists as those models deal in encoded offsets.
>>>>
>>>> More important than expectations surrounding what to provide to a
>>>> subscript are expectations surrounding algorithmic complexity. This
>>>> has security implications. The expectation of subscript is that it
>>>> is “constant-ish”, for a fuzzy hand-wavy definition of
>>>> “constant-ish” which includes amortized constant or logarithmic.
>>>>
>>>> Now, I agree with the overall sentiment that `index(offsetBy:)` is
>>>> unwieldy. I am interested in approaches to improve this. But, we
>>>> cannot throw linear complexity into subscript without extreme
>>>> justification.
>>>>
>>>>
>>>>> On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn at 163.com
>>>>> <mailto:frogcjn at 163.com>> wrote:
>>>>>
>>>>> This offset is unicode offset, is not the offset of element.
>>>>> For example: index(startIndex, offsetBy:1) is encodedOffset 4 or
>>>>> 8, not 1.
>>>>>
>>>>> Offset indexable is based on the offset of count of each
>>>>> element/index. it is the same result of s.index(s.startIndex,
>>>>> offsetBy:i)
>>>>> The encodedOffset is the underlaying offset of unicode string, not
>>>>> the same concept of the offset index of collection.
>>>>>
>>>>> The offset indexable is meaning to the elements and index of
>>>>> collection (i-th element of the collection), not related to the
>>>>> unicode offset (which is the underlaying data offset meaning to
>>>>> the UTF-16 String).
>>>>>
>>>>> These two offset is totally different.
>>>>>
>>>>> Best,
>>>>> Jiannan
>>>>>
>>>>>> 在 2017年12月15日,上午9:17,Michael Ilseman <milseman at apple.com
>>>>>> <mailto:milseman at apple.com>> 写道:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev
>>>>>>> <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>
>>>>>>> People used to the offset index system instead of the
>>>>>>> String.Index. Using offset indices to name the elements, count
>>>>>>> the elements is normal and nature.
>>>>>>>
>>>>>>
>>>>>> The offset system that you’re referring to is totally available
>>>>>> in String today, if you’re willing for it to be the offset into
>>>>>> the encoding. That’s the offset “people” you’re referring to are
>>>>>> likely used to and consider normal and natural. On String.Index,
>>>>>> there is the following:
>>>>>>
>>>>>> |init(encodedOffset offset: Int
>>>>>> <https://developer.apple.com/documentation/swift/int>)|
>>>>>>
>>>>>> and
>>>>>>
>>>>>> |var encodedOffset: Int
>>>>>> <https://developer.apple.com/documentation/swift/int> { get }|
>>>>>>
>>>>>>
>>>>>> [1] https://developer.apple.com/documentation/swift/string.index
>>>>>>
>>>>>>
>>>>>>> *This offset index system has a long history and a real meaning
>>>>>>> to the collection. *The subscript s[i] has a fix meaning of
>>>>>>> "getting the i-th element in this collection", which is normal
>>>>>>> and direct. Get the range with offset indices, is also direct.
>>>>>>> It means the substring is from the i-th character up to the j-th
>>>>>>> character of the original string.
>>>>>>>
>>>>>>> People used to play subscript, range with offset indices. Use
>>>>>>> string[string.index(i, offsetBy: 5)] is not as directly and
>>>>>>> easily as string[i + 5]. Also the Range<String.Index> is not as
>>>>>>> directly as Range<Offset>. Developers need to transfer the
>>>>>>> Range<String.Index> result of string.range(of:) to
>>>>>>> Range<OffsetIndex> to know the exact range of the substring.
>>>>>>> Range<String.Index> has a real meaning to the machine and
>>>>>>> underlaying data location for the substring, but
>>>>>>> Range<OffsetIndex> also has a direct location information for
>>>>>>> human being, and represents*the abstract location concept of the
>>>>>>> collection (This is the most UNIMPEACHABLE REASON I could provide)*.
>>>>>>> *
>>>>>>> *
>>>>>>> *Offset index system is based on the nature of collection. Each
>>>>>>> element of the collection could be located by offset, which is a
>>>>>>> direct and simple conception to any collection. Right? *Even the
>>>>>>> String with String.Index has some offset index property within
>>>>>>> it. For example: the `count` of the String, is the offset index
>>>>>>> of the endIndex.The enumerated() generated a sequence with
>>>>>>> elements contains the same offset as the offset index system
>>>>>>> provided. And when we apply Array(string), the string divided by
>>>>>>> each character and make the offset indices available for the new
>>>>>>> array.
>>>>>>>
>>>>>>> *The offset index system is just an assistant for collection,
>>>>>>> not a replacement to String.Index. *We use String.Index to
>>>>>>> represent the normal underlaying of the String. We also could
>>>>>>> use offset indices to represent the nature of the Collection
>>>>>>> with its elements. Providing the offset index as a second choice
>>>>>>> to access elements in collections, is not only for the String
>>>>>>> struct, is for all collections, since *it is the nature of the
>>>>>>> collection concept*, and developer could choose use it or not.**
>>>>>>>
>>>>>>> We don't make the String.Index O(1), but translate the offset
>>>>>>> indices to the underlaying String.Index. Each time using
>>>>>>> subscript with offset index, we just need to translate offset
>>>>>>> indices to underlaying indices using c.index(startIndex,
>>>>>>> offsetBy:i), c.distance(from: startIndex, to:i)
>>>>>>>
>>>>>>> We can make the offset indices available through extension to
>>>>>>> Collection (as my GitHub repo demo:
>>>>>>> https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-).
>>>>>>>
>>>>>>> or we could make it at compile time:
>>>>>>> for example
>>>>>>>
>>>>>>> c[1...]
>>>>>>> compile to
>>>>>>> c[c.index(startIndex, offsetBy:1)...]
>>>>>>>
>>>>>>> let index: Int = s.index(of: "a")
>>>>>>> compile to
>>>>>>> let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))
>>>>>>>
>>>>>>> let index = 1 // if used in s only
>>>>>>> s[index..<index+2]
>>>>>>> compile to
>>>>>>> let index = s.index(s.startIndex, offsetBy: 1)
>>>>>>> s[index..<s.index(index, offsetBy: 2)]
>>>>>>>
>>>>>>> let index = 1 // if used both in s1, s2
>>>>>>> s1[index..<index+2]
>>>>>>> s2[index..<index+2]
>>>>>>> compile to
>>>>>>> let index = 1
>>>>>>> let index1 = s1.index(s.startIndex, offsetBy: index)
>>>>>>> let index2 = s2.index(s.startIndex, offsetBy: index)
>>>>>>> s1[index1..<s.index(index1, offsetBy: 2)]
>>>>>>> s2[index2..<s.index(index2, offsetBy: 2)]
>>>>>>>
>>>>>>> I really want the team to consider providing the offset index
>>>>>>> system as an assistant to the collection. It is the very
>>>>>>> necessary basic concept of Collection.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jiannan
>>>>>>>
>>>>>>>> 在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose at apple.com
>>>>>>>> <mailto:jordan_rose at apple.com>> 写道:
>>>>>>>>
>>>>>>>> We really don't want to make subscripting a non-O(1) operation.
>>>>>>>> That just provides false convenience and encourages people to
>>>>>>>> do the wrong thing with Strings anyway.
>>>>>>>>
>>>>>>>> I'm always interested in why people want this kind of ability.
>>>>>>>> Yes, it's nice for teaching programming to be able to split
>>>>>>>> strings on character boundaries indexed by integers, but where
>>>>>>>> does it come up in real life? The most common cases I see are
>>>>>>>> trying to strip off the first or last character, or a known
>>>>>>>> prefix or suffix, and I feel like we should have better answers
>>>>>>>> for those than "use integer indexes" anyway.
>>>>>>>>
>>>>>>>> Jordan
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev
>>>>>>>>> <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I would like to discuss the String.Index problem within Swift.
>>>>>>>>> I know the current situation of String.Index is based on the
>>>>>>>>> nature of the underlaying data structure of the string.
>>>>>>>>>
>>>>>>>>> But could we just make String.Index contain offset
>>>>>>>>> information? Or make offset index subscript available for
>>>>>>>>> accessing character in String?
>>>>>>>>>
>>>>>>>>> for example:
>>>>>>>>>
>>>>>>>>> leta = "01234"
>>>>>>>>> print(a[0]) // 0
>>>>>>>>> print(a[0...4]) // 01234
>>>>>>>>> print(a[...]) // 01234
>>>>>>>>> print(a[..<2]) // 01
>>>>>>>>> print(a[...2]) // 012
>>>>>>>>> print(a[2...]) // 234
>>>>>>>>> print(a[2...3]) // 23
>>>>>>>>> print(a[2...2]) // 2
>>>>>>>>> ifletnumber = a.index(of: "1") {
>>>>>>>>> print(number) // 1
>>>>>>>>> print(a[number...]) // 1234
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 0 equals to Collection.Index of collection.index(startIndex,
>>>>>>>>> offsetBy: 0)
>>>>>>>>> 1 equals to Collection.Index of collection.index(startIndex,
>>>>>>>>> offsetBy: 1)
>>>>>>>>> ...
>>>>>>>>> we keep the String.Index, but allow another kind of index,
>>>>>>>>> which is called "offsetIndex" to access the String.Index and
>>>>>>>>> the character in the string.
>>>>>>>>> Any Collection could use the offset index to access their
>>>>>>>>> element, regarding the real index of it.
>>>>>>>>>
>>>>>>>>> I have make the Collection OffsetIndexable protocol available
>>>>>>>>> here, and make it more accessible for StringProtocol
>>>>>>>>> considering all API related to the index.
>>>>>>>>>
>>>>>>>>> https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-
>>>>>>>>>
>>>>>>>>> If someone want to make the offset index/range available for
>>>>>>>>> any collection, you just need to extend the collection:
>>>>>>>>> extension String :OffsetIndexableCollection {
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> extension Substring :OffsetIndexableCollection {
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I hope the Swift core team could consider bring the offset
>>>>>>>>> index to string, or make it available to other collection,
>>>>>>>>> thus let developer to decide whether their collection could
>>>>>>>>> use offset indices as an assistant for the real index of the
>>>>>>>>> collection.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Jiannan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20180103/4d92d8b8/attachment.html>
More information about the swift-dev
mailing list