[swift-evolution] Pitch: Improved Swift pointers

Fri Jul 14 00:30:55 CDT 2017

On Fri, Jul 14, 2017 at 12:22 AM, Andrew Trick <atrick at apple.com> wrote:

>
> On Jul 13, 2017, at 6:55 PM, Taylor Swift <kelvin13ma at gmail.com> wrote:
>
>
>
> On Thu, Jul 13, 2017 at 6:56 PM, Andrew Trick <atrick at apple.com> wrote:
>
>>
>> On Jul 12, 2017, at 12:16 PM, Taylor Swift via swift-evolution <
>> swift-evolution at swift.org> wrote:
>>
>> Hi all, I’ve written up a proposal to modify the unsafe pointer API for
>> greater consistency, safety, and ease of use.
>>
>> ~~~
>>
>> Swift currently offers two sets of pointer types — singular pointers such
>> as UnsafeMutablePointer, and vector (buffer) pointers such as
>> UnsafeMutable*Buffer*Pointer. This implies a natural separation of tasks
>> the two kinds of pointers are meant to do. For example, buffer pointers
>> implement Collection conformance, while singular pointers do not.
>>
>> However, some aspects of the pointer design contradict these implied
>> roles. It is possible to allocate an arbitrary number of instances from a
>> type method on a singular pointer, but not from a buffer pointer. The
>> result of such an operation returns a singular pointer, even though a
>> buffer pointer would be more appropriate to capture the information about
>> the *number* of instances allocated. It’s possible to subscript into a
>> singular pointer, even though they are not real Collections. Some parts
>> of the current design turn UnsafePointers into downright *Dangerous*Pointers,
>> leading users to believe that they have allocated or freed memory when in
>> fact, they have not.
>>
>> This proposal seeks to iron out these inconsistencies, and offer a more
>> convenient, more sensible, and less bug-prone API for Swift pointers.
>>
>> <https://gist.github.com/kelvin13/a9c033193a28b1d4960a89b25fbffb06>
>>
>> ~~~
>>
>>
>> Thanks for taking time to write this up.
>>
>> General comments:
>>
>> UnsafeBufferPointer is an API layer on top of UnsafePointer. The role
>> of UnsafeBufferPointer is direct memory access sans lifetime
>> management with Collection semantics. The role of UnsafePointer is
>> primarily C interop. Those C APIs should be wrapped in Swift APIs that
>> take UnsafeBufferPointer whenever the pointer represents a C array. I
>> suppose making UnsafePointer less convenient would push developers
>> toward UnsafeBufferPointer. I don't think that's worth outright
>> breaking source, but gradual deprecation of convenience methods, like
>> `susbscript` might be acceptable.
>>
>
> Gradual deprecation is exactly what I am proposing. As the document states
> <https://gist.github.com/kelvin13/a9c033193a28b1d4960a89b25fbffb06#proposed-solution>,
> the only methods which should be marked immediately as unavailable are the `
> deallocate(capacity:)` methods, for safety and source compatibility
> reasons. Removing `deallocate(capacity:)` now and forcing a loud compiler
> error prevents catastrophic *silent* source breakage in the future, or
> worse, from having to *support our own bug*.
>
>>
>>
>> I have mixed feelings about stripping UnsafePointer of basic
>> functionality. Besides breaking source, doing that would be
>> inconsistent with its role as a lower API layer. The advantage would
>> just be descreasing API surface area and forcing developers to use a
>> higher-level API.
>>
>
> UnsafePointer is as much a high level API as UnsafeBufferPointer is.
>
>
> No it isn’t. We don’t have support for importing certain function
> signatures as taking UnsafeBufferPointer and UnsafePointer doesn't conform
> to Collection even though it nearly always represents an array.
>

C functions get imported as taking UnsafePointers because buffer pointers
in C are represented as pointer–length pairs. UnsafePointer can’t conform
to Collection because there’s no way to automatically associate that length
information with the pointer. So why do you think UnsafePointers should
support doing Collection-y things if a precondition for performing
collection-y operations is knowing the length? No sense exposing a `capacity`
argument if you don’t have a `count` to go in it. That’s why I don’t really
agree with viewing UnsafePointers as arrays, since trying to do random
access into something you don’t know the length of seems off to me.

>
> You wouldn’t create a buffer pointer of length 1 just so you can “stick
> with the high level API”. UnsafePointer and UnsafeBufferPointer are two
> tools that do related but different things and they can exist at whatever
> abstract level you need them at. After all, UnsafeBufferPointer is nothing
> but an UnsafePointer? with a length value attached to it. If you’re
> allocating more than one instance of memory, you almost certainly need to
> track the length of the buffer anyway.
>
>
> You could call this a proposal to "make unsafe pointer APIs easier to use
> safely". I just want to put an end to the fallacy that the buffer type is
> for multiple values and the plain old pointer represents single instances.
>

I mean, parts of the current API do heavily suggest that the plain pointer
is for single instances — there’s the `pointee` property after all. `move()`
vacates and returns a single element. In fact, I’m not sure when you would
initialize multiple values, and then only vacate the one at offset 0. `
successor()` and `predecessor()` only make sense if a pointer represents
one single thing — it’d be weird if they represented some kind of weird
sliding window where one end is `pointee` and the other end is… ???.

My point being, you can only (safely) use the multiple-instance API if you
have access to the count of elements. But if you have access to the count, *you
effectively have a BufferPointer*. So why not put the buffer tools in the
buffer shed?

>
> The additive changes you propose are fairly obvious. See [SR-3088]
>> UnsafeMutableBufferPointer doesn't have an allocating init.
>>
>> I haven't wanted to waste review cycles on small additive
>> changes. It may make sense to batch them up into one coherent
>> proposal. Here are a few more to consider.
>>
>> - [SR-3929] UnsafeBufferPointer should have init from mutable
>> - [SR-4340] UnsafeBufferPointer needs a withMemoryRebound method
>> - [SR-3087] No way to arbitrarily initialise an Array's storage
>>
>
> The feature requests you mention are all very valuable, however with
> Michael’s point about fixing the memorystate API’s, the size of this
> proposal has already grown to encompass dozens of methods in five types. I
> think this says a lot about just how broken the current system is, but I
> think it’s better to try to fix one class of problems at a time, and save
> the less closely-related issues for separate proposals.
>
>
>>
>> Point by point:
>>
>> > drop the capacity parameter from UnsafeMutablePointer.allocate() and
>> deallocate().
>>
>> I do not agree with removing the capacity parameter and adding a
>> single-instance allocation API. UnsafePointer was not designed for
>> single instances, it was primarily designed for C-style arrays. I
>> don't see the value in providing a different unsafe API for single
>> vs. multiple values.
>>
>
> Although it’s common to *receive* Unsafe__Pointers from C API’s, it’s rare
> to *create* them from the Swift side. 95% of the time your Swift data lives
> in a Swift Array, and you use withUnsafePointer(_:) to send them to the C
> API, or just pass them directly with Array bridging.
>
> The only example I can think of where I had to allocate memory from the
> Swift side to pass to a C API is when I was using the Cairo C library and I
> wanted the Swift code to own the image buffer backing the Cairo C structs
> and I wanted to manage the memory manually to prevent the buffer backing
> from getting deallocated prematurely. I think I ended up using
> UnsafeMutableBufferPointer and extracting baseAddresses to manage the
> memory. This proposal tries to mitigate that pain of extracting
> baseAddresses by giving buffer pointers their own memory management methods.
>
>
> The usability issue with Optional baseAddress is a very real one. I'm
> unsure why that hasn't been fixed yet (I think that’s between Jordan and
> Dave). I don't see that as a justification for the broader changes in this
> proposal.
>

This was actually part of the first drafts
<https://gist.github.com/kelvin13/a9c033193a28b1d4960a89b25fbffb06/cc3d7c349e5f5600ae592ee394438f68068df15b>
of the proposal. I was told this had already been discussed at length, and
the community supposedly decided against it a long time ago, so it was
removed from the proposal.

>
> As for the UnsafePointers you get from C APIs, they almost always come
> with a size (or you specify it beforehand with a parameter) so you’re
> probably going to be turning them into UnsafeBufferPointers anyway.
>
> I also have to say it’s not common to deallocate something in Swift that
> you didn’t previously allocate in Swift.
>
>
> Yes. You have a good argument for removing allocate/deallocate completely.
> My point was that I don't want to add a single instance allocate method.
> UnsafePointer should not be viewed as a single instance pointer, because
> that's not how it's used.
>
>
So, should `UnsafeMutableBufferPointer<Element>.allocate(count:
1).baseAddress!` be the preferred way to allocate a single instance?
Or `UnsafeMutablePointer<T>.allocate(count:
1)` if we keep 2 sets of APIs? Writing “count: 1” seems kind of strange to
me.

> I agree the primary allocation API should be
>> UnsafeMutableBufferPointer.allocate(capacity:). There is an argument
>> to be made for removing UnsafeMutablePointer.allocate(capacity:)
>> entirely. But, as Michael Ilseman pointed out, that would involve
>> reevaluating several other members of the UnsafePointer API. I think
>> it's reasonable for UnsafePointer to retain all its functionality as a
>> lower level API.
>>
>>
> I think duplication of functionality is something to be avoided if
> possible.
>
>
> The issue is whether we need to revisit all the
> initialize/deinitialize/move API surface if we decide that all the uses
> that can me moved to UnsafeBufferPointer really should be.
>

I was working that out earlier today. The latest version
<https://gist.github.com/kelvin13/a9c033193a28b1d4960a89b25fbffb06>
outlines exactly what I think should happen to all the memorystate
functions.

>
>
> I don't understand what is misleading about
>> UnsafePointer.deallocate(capacity:). It *is* inconvenienent for the
>> user to keep track of memory capacity. Presumably that was done so
>> either the implementation can move away from malloc/free or some sort
>> of memory tracking can be implemented on the standard library
>> side. Obviously, UnsafeBufferPointer.deallocate() would be cleaner in
>> most cases.
>>
>
> It’s misleading because it plain doesn’t deallocate `capacity` instances.
> It deletes the whole memory block regardless of what you pass in the
> capacity argument. If the implementation is ever “fixed” so that it
> actually deallocates `capacity` instances, suddenly every source that uses
> `deallocate(capacity:)` will break, and *no one will know* until their app
> starts mysteriously crashing. If the method is not removed, we will have to
> support this behavior to avoid breaking sources, and basically say “yes the
> argument label says it deallocates a capacity, but what it *really* does is
> free the whole block and we can’t fix it because existing code assumes this
> behavior”.
>
>
> You could have the same problem with slicing up an UnsafeBufferPointer. I
> agree that this reinforces the argument for eliminating
> UnsafeMutablePointer.allocate/deallocate. It also reinforces my argument
> for not adding a single-instance allocate/deallocate.
>
>
You have a good point here, it’s still not very safe, but I hold that it is
still far, far safer than what we have currently. Getting rid of the
`capacity` label helps drive home that swift_slowDealloc doesn’t care what
number of instances you want it to free. Unsafe__Pointers will always be
unsafe, but if we can make them less unsafe, we should.

I’ll admit this is a good argument against “single instance” deallocate. We
don’t want people trying to free each address in a buffer pointer 😨. But
it’s not a direct argument against single instance allocate, which I think
would be very useful, and it would be weird to have single instance allocate
but no corresponding deallocate. We also have to consider what happens to
the proposed single-instance memorystate functions, as those *are* safe and
*are* useful. Should single-instance deallocate be the one missing
function? I agree this is definitely something to think carefully about.

> add an allocate(count:) type method to UnsafeMutableBufferPointer
>>
>> `capacity` should be used for allocating uninitialized memory not
>> `count`. `count` should only refer to a number of initialized objects!
>>
>
> We can decide on what the correct term should be, but the current state of
> Swift pointers is that *neither* convention is being followed. Just look at
> the API for UnsafeMutableRawPointer. It’s a mess. This proposal at the
> minimum establishes a consistent convention. It can be revised if you feel
> `capacity` is more appropriate than `count`. If what you mean is that it’s
> important to maintain the distinction between “initialized counts” and
> “uninitialized counts”, well that can be revised in too.
>
>
> You lost me. It’s always been clear to me that
>
> a. There are a lot of redundant initializers to avoid relying on automatic
> conversion. Those should probably be removed now (to the extent that it
> doesn’t break source).
>
> b. There are a number of convenience methods we should add to the API. But
> it’s better keep the API minimal until more developers, such as yourself,
> have had a chance to offer feedback.
>
> I’m not aware of messiness or inconsistent conventions at the API level.
>
> -Andy
>
>
I’m confused I thought we were talking about the naming choices for the
argument labels in those functions. I think defining and abiding by
consistent meanings for `count`, `capacity`, and `bytes` is a good idea,
and it’s part of what this proposal tries to accomplish. Right now half the
time we use `count` to refer to “bytes” and half the time we use it to
refer to “instances”. The same goes for the word “capacity”. This is all
laid out in the document:

“““
*Finally, the naming and design of some UnsafeMutableRawPointer members
deserves to be looked at. The usage of capacity, bytes, and count as
argument labels is wildly inconsistent and confusing. In
copyBytes(from:count:), count refers to the number of bytes, while in
initializeMemory<T>(as:at:count:to:) and
initializeMemory<T>(as:from:count:), count refers to the number of strides.
Meanwhile bindMemory<T>(to:capacity:) uses capacity to refer to this
quantity. The always-problematic deallocate(bytes:alignedTo) method and
allocate(bytes:alignedTo:) type methods use bytes to refer to
byte-quantities. Adding to the confusion, UnsafeMutableRawBufferPointer
offers an allocate(count:) type method (the same signature method we’re
trying to add to UnsafeMutableBufferPointer), except the count in this
method refers to bytes. This kind of API naming begets stride bugs and
makes Swift needlessly difficult to learn.*
”””

The only convenience methods this proposal is trying to add is the
functionality on the buffer pointer types. There seems to be broad support
for adding this functionality as no one has really opposed that part of the
proposal yet. Any other new methods like `UnsafeMutablePointer.assign(to:)`
are there for API consistency.

This proposal also calls for getting rid of one of those “redundant
initializers” :)

>
>
> > add a deallocate() instance method to UnsafeMutableBufferPointer
>>
>> Yes, of course! I added a mention of that in SR-3088.
>>
>> > remove subscripts from UnsafePointer and UnsafeMutablePointer
>>
>> It's often more clear to perform arithmetic on C array indices rather
>> than pointers. That said, I'm happy to push developers to use
>> UnsafeBufferPointer whenever that have a known capacity. To me, this
>> is a question of whether the benefit of making a dangerous thing less
>> convenient is worth breaking source compatibility.
>>
>
> Again, I think this is more about what the real use patterns are. If you
> are subscripting into a C array with integers, then UnsafeBufferPointer is
> the tool for the job, since it give you Collection conformance. If you
> can’t make an UnsafeBufferPointer, it’s probably because you don’t know the
> length of the array, and so you’re probably iterating through it one
> element at a time. UnsafeMutablePointer.successor() is perfect for this
> job. If you want to extract or set fields at fixed but irregular offsets,
> UnsafeRawPointer is the tool for the job. But I’m hard-pressed to think of
> a use case for random access into a singular typed pointer.
>
>
>
Thanks for your feedback on my proposal. You’ve given some very helpful
considerations about some of these changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170714/88b48fe0/attachment.html>