[swift-evolution] Strings in Swift 4

Matthew Johnson matthew at anandabits.com
Tue Jan 31 20:46:50 CST 2017



Sent from my iPad

> On Jan 31, 2017, at 7:28 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
> 
>> On Tue, Jan 31, 2017 at 7:08 PM, Matthew Johnson <matthew at anandabits.com> wrote:
>> 
>>> On Jan 31, 2017, at 6:54 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>> 
>>>> On Tue, Jan 31, 2017 at 6:40 PM, Matthew Johnson <matthew at anandabits.com> wrote:
>>>> 
>>>>> On Jan 31, 2017, at 6:15 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>>>> 
>>>>>> On Tue, Jan 31, 2017 at 6:09 PM, Matthew Johnson <matthew at anandabits.com> wrote:
>>>>>> 
>>>>>>> On Jan 31, 2017, at 5:35 PM, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>> 
>>>>>>>> On Tue, Jan 31, 2017 at 5:28 PM, David Sweeris <davesweeris at mac.com> wrote:
>>>>>>>> 
>>>>>>>>> On Jan 31, 2017, at 2:04 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> On Tue, Jan 31, 2017 at 3:36 PM, David Sweeris via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Jan 31, 2017, at 11:32, Jaden Geller via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I think that is perfectly reasonable, but then it seems weird to be able to iterate over it (with no upper bound) independently of a collection). It would surprise me if
>>>>>>>>>>> ```
>>>>>>>>>>> for x in arr[arr.startIndex…] { print(x) }
>>>>>>>>>>> ```
>>>>>>>>>>> yielded different results than
>>>>>>>>>>> ```
>>>>>>>>>>> for i in arr.startIndex… { print(arr[i]) } // CRASH
>>>>>>>>>>> ```
>>>>>>>>>>> which it does under this model.
>>>>>>>>>> 
>>>>>>>>>> (I think this how it works... semantically, anyway) Since the upper bound isn't specified, it's inferred from the context.
>>>>>>>>>> 
>>>>>>>>>> In the first case, the context is as an index into an array, so the upper bound is inferred to be the last valid index.
>>>>>>>>>> 
>>>>>>>>>> In the second case, there is no context, so it goes to Int.max. Then, after the "wrong" context has been established, you try to index an array with numbers from the too-large range.
>>>>>>>>>> 
>>>>>>>>>> Semantically speaking, they're pretty different operations. Why is it surprising that they have different results?
>>>>>>>>> 
>>>>>>>>> I must say, I was originally rather fond of `0...` as a spelling, but IMO, Jaden and others have pointed out a real semantic issue.
>>>>>>>>> 
>>>>>>>>> A range is, to put it simply, the "stuff" between two end points. A "range with no upper bound" _has to be_ one that continues forever. The upper bound _must_ be infinity.
>>>>>>>> 
>>>>>>>> Depends… Swift doesn’t allow partial initializations, and neither the `.endIndex` nor the `.upperBound` properties of a `Range` are optional. From a strictly syntactic PoV, a "Range without an upperBound” can’t exist without getting into undefined behavior territory.
>>>>>>>> 
>>>>>>>> Plus, mathematically speaking, an infinite range would be written "[x, ∞)", with an open upper bracket. If you write “[x, ∞]”, with a closed upper bracket, that’s kind of a meaningless statement. I would argue that if we’re going to represent that “infinite” range, the closest Swift spelling would be “x..<“. That leaves the mathematically undefined notation of “[x, ∞]”, spelled as "x…” in Swift, free to let us have “x…” or “…x” (which by similar reasoning can’t mean "(∞, x]”) return one of these:
>>>>>>>> enum IncompleteRange<T> {
>>>>>>>>     case upperValue(T)
>>>>>>>>     case lowerValue(T)
>>>>>>>> }
>>>>>>>> which we could then pass to the subscript function of a collection to create the actual Range like this:
>>>>>>>> extension Collection {
>>>>>>>>     subscript(_ ir: IncompleteRange<Index>) -> SubSequence {
>>>>>>>>         switch ir {
>>>>>>>>         case .lowerValue(let lower): return self[lower ..< self.endIndex]
>>>>>>>>         case .upperValue(let upper): return self[self.startIndex ..< upper]
>>>>>>>>         }
>>>>>>>>     }
>>>>>>>> }
>>>>>>> 
>>>>>>> I understand that you can do this from a technical perspective. But I'm arguing it's devoid of semantics.  That is, it's a spelling to dress up a number.
>>>>>> 
>>>>>> It’s not any more devoid of semantics than a partially applied function.
>>>>> 
>>>>> Yes, but this here is not a partially applied type.
>>>>> 
>>>>> Nor does it square with your proposal that you should be able to use `for i in 0...` to mean something different from `array[0...]`. We don't have partially applied functions doubling as function calls with default arguments.
>>>> 
>>>> I’m not trying to say it’s *exactly* like a partially applied function.
>>> 
>>> I'm not saying you're arguing that point. I'm saying that there is a semantic distinction between (1) a range with two bounds where you've only specified the one, and (2) a range with one bound. There must be an answer to the question: what is the nature of the upper bound of `0...`? Either it exists but is not yet known, or it is known that it does not exist (or, it is not yet known whether or not it exists). But these are not the same thing!
>>> 
>>>>>> It is a number or index with added semantics that it provides a lower (or upper) bound on the possible value specified by its type.
>>>>>> 
>>>>>>> 
>>>>>>> What is such an `IncompleteRange<T>` other than a value of type T? It's not an upper bound or lower bound of anything until it's used to index a collection. Why have a new type (IncompleteRange<T>), a new set of operators (prefix and postfix range operators), and these muddied semantics for something that can be written `subscript(upTo upperBound: Index) -> SubSequence { ... }`? _That_ has unmistakable semantics and requires no new syntax.
>>>>>> 
>>>>>> Arguing that it adds too much complexity relative to the value it provides is reasonable.  The value in this use case is mostly syntactic sugar so it’s relatively easy to make the case that it doesn’t cary its weight here.
>>>>>> 
>>>>>> The value in Ben’s use case is a more composable alternative to `enumerated`.  I find this to be a reasonably compelling example of the kind of thing a partial range might enable.
>>>>> 
>>>>> Ben's use case is not a "partial range." It's a bona fide range with no upper bound.
>>>> 
>>>> Ok, fair enough.  Let’s call it an infinite range then.
>>>> 
>>>> We can form an infinite range with an Index even if it’s an opaque type that can’t be incremented or decremented.  All we need is a comparable Bound which all Indices meet.  We can test whether other indices are contained within that infinite range and can clamp it to a tighter range as well.  This clamping is what would need to happen when an infinite range is passed to a collection subscript by providing an upper bound.  
>>>> 
>>>> The only thing unusual about this is that we don’t usually do a bounds check of any kind when subscripting a collection.
>>> 
>>> Precisely. This would be inconsistent. If lenient subscripts as once proposed were accepted, however, then perhaps `arr[lenient: 0...]` would make sense.
>>> 
>>> But that's not getting to the biggest hitch with your proposal. If subscript were lenient, then `arr[lenient: 42...]` would also have to give you a result even if `arr.count == 21`.
>>> 
>>> This is not at all what Dave Abrahams was proposing, though (unless I totally misunderstand). He truly doesn't want an infinite range. He wants to use a terser notation for saying: I want x to be the lower bound of a range for which I don't yet know (or haven't bothered to find out) the finite upper bound. It would be plainly clear, if spelled as `arr[from: 42]`, that if `arr.count < 43` then this expression will trap, but if `arr.count >= 43` then this expression will give you the rest of the elements.
>> 
>> Right.  I was not making the necessary distinction between incomplete ranges and infinite ranges.  Jaden provided an accurate description of what I was trying to get at and it *does* require both `IncompleteRange` and `InfiniteRange` to do it properly.
> 
> Cool, I think we broadly agree on the conclusion here. The reason I'm harping on this point is that one obviously needs to demonstrate compelling use cases. By conflating different concepts together, we're inflating all the wonderful things that you can do.

That's fair.  Thanks for sticking with it until we were all on the same page.

> 
>> I’m not necessarily trying to argue that we *should* do this, only that there isn’t a fundamental semantic problem with it.  In a language like Swift there is no fundamental reason that `0…` must semantics independent of context.  Allowing context to provide the semantics doesn’t seem any more magical than allowing context to define the type of literals like `0`.
> 
> Hmm, disagree here. Literals aren't typed, they aren't instances of anything, and thus they do not have any particular semantics. When they are used to express a value, that value has a particular type with particular semantics.
> 
> That we have been talking about `0...` clouds the fact that we are talking about a function that takes a single argument which doesn't have to be a literal, and which must return a value of a particular type. (That is, unless you want to overload the function, in which case every naked `0...` would need to be written `0... as IncompleteRange` or `0... as UnboundedRange`.) And since you're going to get an instance of some particular type, this implies some particular semantics. Given that `arr[upTo: 42]` is perfectly nice-looking and does exactly what you'd want it to do, it is hard to argue that a superior alternative is one that requires new types, new operators, context-dependent semantics, and compiler magic.

One important point Jaden made is that the different use cases for `0...` each have different syntactic contexts.  This context (collection subscript or something that requires a sequence) can drive inference to determine the correct type.  So yes, we are talking about overloading postfix `...` for some types, notably `Int` (for array slices and infinite integer ranges).

I can't think of any cases where we would want an overload for both `IncompleteRange` and `InfiniteRange`.  Users would only need to supply a type when they wish to use postfix `...` with an argument of a type which has both an incomplete and an infinite range overload of postfix `...` and there is no immediate syntactic context to resolve the ambiguity.  Without thinking about it too deeply it seems like such uses would be rare and the annotation perfectly acceptable when necessary.

Return type overloading is a tool that should be used rarely and carefully, but this is a case where it makes sense conceptually and may make sense to adopt.

All of that said, I don't have a particularly strong opinion about which way we should go and you do make a good case for just using parameter labels.  I like the aesthetics, syntactic elegance of the operator, and its uniformity with other collection subscript overloads but it may not be the right choice.  

In addition to everything you have brought up there is also the open question of how it might impact variadic generics and tuple unpacking.  If that question was resolved without concern I would probably lean towards the operator approach, but not strongly.  Until then I'm neutral.

> 
>>>>>> I also tend to find concise notation important for clarity as long as it isn’t obscure or idiosyncratic.  With that in mind, I think I lean in favor of `…` so long as we’re confident we won’t regret it if / when we take up variadic generics and / or tuple unpacking.
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> swift-evolution mailing list
>>>>>>> swift-evolution at swift.org
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170131/11cce7ea/attachment.html>


More information about the swift-evolution mailing list