[swift-evolution] [Pitch] Normalize Slice Types for Unsafe Buffers

Thu Dec 8 18:22:54 CST 2016

> On Dec 8, 2016, at 12:50 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
> 
> 
> on Thu Dec 08 2016, Ben Cohen <ben_cohen-AT-apple.com> wrote:
> 
>>> On Dec 2, 2016, at 8:27 PM, Nate Cook <natecook at gmail.com> wrote:
>>> 
>>>> On Dec 2, 2016, at 2:12 PM, Ben Cohen via swift-evolution
>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>>
>>>> wrote:
>> 
>>>> 
>>>>> On Dec 1, 2016, at 11:33 PM, Nate Cook via swift-evolution
>>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>>
>>>>> wrote:
>>>>> 
>>>>> 3) Make all buffer pointers their own slices but use a different
>>>>> index type. If the indices were just wrapped pointers, that would
>>>>> handle the index sharing without needing an additional property on
>>>>> the buffer. We could also maintain integer-based stridable
>>>>> conformance (which greatly simplifies index arithmetic), since the
>>>>> indices would just offset by a byte for raw buffers or a stride
>>>>> for typed buffers.
>>>>> 
>>>> 
>>>> Unfortunately, switching to non-integer indices would change this
>>>> from being mildly source-breaking to being extremely
>>>> source-breaking, as there’s lots of code out there using buffers
>>>> today indexing them with integers (including integer literals).
>>>> 
>>>> The big win with UnsafeBufferPointer having an integer index is
>>>> it’s a drop-in replacement for arrays, so when you hit a
>>>> performance problem using an array you can quickly switch to using
>>>> a buffer under most circumstances instead without having to change
>>>> much of your code – including code that uses for i in
>>>> 0..<myArray.count, of which there is a lot out there in the
>>>> wild. Switching to an opaque index would break anyone doing that.
>>> 
>>> It is definitely very source-breaking, though with relatively simple fixits:
>>> 
>>> 	buf[0] ---> buf[buf.startIndex]
>>> 	buf[3] ---> buf[buf.startIndex + 3]
>>> 	buf[i] ---> buf[buf.startIndex + i]
>>> 
>>> Any integer arithmetic happening outside the subscript could be left
>>> unchanged. If that cost isn't worth the benefit, then making
>>> UnsafeRawBufferPointer use Slice as its slice type is probably the
>>> best way to resolve that issue.
>>> 
>>> Nate
>> 
>> The fixits aren’t quite that simple for slices, though:
>> 
>> 	let slice = buf[3..<6]
>> 	slice[3] —> slice[slice.startIndex + 0] // fixit would somehow need to know this is 0 not 3
>> 	slice[i] —> slice[slice.startIndex + ??] // or even need to
>> know this is, erm, I haven’t had enough coffee this morning
>> 
>> The other downside is it would thwart speculatively switching an Array
>> to an UnsafeBuffer to see if that was a bottleneck, then switching
>> back.
>> 
>>> On Dec 1, 2016, at 11:33 PM, Nate Cook via swift-evolution <swift-evolution at swift.org> wrote:
>>> 
>>> 1) Switch to using Slice as a wrapper for UnsafeRawBufferPointer.
>>> 
>> 
>> Based on the above, it seems like this is the least bad option, and we
>> need to do this ASAP as currently UnsafeRawBufferPointer is
>> non-compliant with the requirements of slicing and needs changing
>> before it’s more widely adopted.
> 
> Or we could say that UnsafeRawBufferPointer isn't a Collection.  Making
> it a Collection in the first place has always seemed suspect to me.
> 
> -- 
> -Dave

UnsafeRawBufferPointer does not need to be a Collection, but should at least be a Sequence. It is a Collection now simply because it fits the criteria (nondestructively accessed and subscriptable).

In practice, it needs to be able to interoperate with [UInt8] and be interchangeable in the same generic context.

e.g. `byteBuffer += rawBuffer[payloadIndex..<endIndex]` is typical.

I think Sequence is sufficient for that purpose.

I can't see any reason that generic Collection algorithms (aside from simply copying elements) would apply to UnsafeRawBufferPointer. If you don't know the element's type, you can't really apply any logic to the collection. e.g. filtering makes no sense.

We definitely do not want Collection's semantics for Slice indices, because they obviously make no sense when working with integer indices.

So, I would be happy to make UnsafeRawBufferPointer a Sequence rather than a Collection. That does fix the correctness issue with generic algorithms. But it leaves some chaos in the world.

Range subscript is an important feature for manual data layout, so we should not get rid of that. And you would never want to see that subsequence of bytes as an unnormalized view over the original buffer. (That's blatanly wrong).

With range subscript working the way it should, we still have an inconsistency between [UInt8] and UnsafeRawBufferPointer in a nongeneric context.

buffer[i..<n][0] == buffer[i]
array[i..<n][0] -> out-of-bounds

To some extent, this ship has sailed. I can see a few of options now:

1. We make UnsafeRawBufferPointer a Sequence and just live with that inconsistency. Users need discern the importance of range subscript's result type. If it's a Slice, then they should not be using integer offsets.

2. We allow UnsafeRawBufferPointer to be a Collection, use the Slice wrapper for range subscript, but add a conversion from Slice back to UnsafeRawBufferPointer that normalizes byte offsets. That way normal use cases are somewhat supported (otherwise there's no point in supporting range subscript):

readBuffer(UnsafeRawBufferPointer(subRange: buffer[i..<n]))

Is at least better than:

readBuffer(UnsafeRawBufferPointer(
  start: buffer.baseAddress?.advanced(by: i),
  count: n - i))

3. We disallow range subscript in UnsafeRawBufferPointer and add some kind of `subRange` getter:

Before:
  buffer[i..<n] = buffer[j..<(j + n - i)]

After:
  buffer.subRange(i..<n).copyBytes(from: buffer.subRange(j..<(j + n - i))

This hinges on a question that I can't answer. What's the intention behind the Slice API? Is it effectively a bug to use integer indices on a Slice in a nongeneric context? If so, then we can live with the inconcsistency in #1. If not, then I think UnsafeRawBufferPointer's range subscript should work the same way as other Collections, and for maximum consistency with other API's it really should be a Collection (#2).

-Andy