[swift-evolution] [Draft] UnsafeRawPointer API

Tue Jun 28 15:02:05 CDT 2016

> On Jun 27, 2016, at 10:18 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> 
> Hi Andy,
> 
> Thank you for the proposal!  A few comments from me.

Thanks for the feedback (again)! I updated the language in the proposal (again).

> - In the "Custom memory allocation section" you write:
> 
>> Note: The same allocated raw memory cannot be used both for this custom memory allocation case and for the C buffer case above because the C buffer requries that the allocated raw memory is always initialized to the same type.
> 
> Could you provide more explanation?  I'm not quite getting it.  Is
> this because of binding -- are you saying that there is no way to
> un-bind the type?

It now reads:

Note: The same allocated raw memory cannot be used both for this
custom memory allocation case and for the C buffer case above because
the C buffer binds the allocated memory to an element type. Binding
the type applies to the allocation lifetime and requries that the
allocated raw memory is always initialized to the same type.

This should make sense after reading the earlier section on initializing memory that I’ve updated as explained below...

> - In the "Accessing uninitialized memory with a typed pointer (binding
> the type)" section you write
> 
>> This cast explicitly signals the intention to bind the raw memory to the destination type.
> 
> I think that "signals the intention" is not a strong enough wording,
> it is open to interpretations.  Either it is a no-op "intention" (that
> can be retracted) or it is the actual binding.  From other discussions
> in this thread, I think you are proposing that the .toType() method
> actually binds the memory to a type.  Is this right?  Here's what made
> me think this way:
> 
>> The following code is undefined:
>> ```
>> ptrA = rawPtr.cast(to: UnsafePointer<A>.self)
>> ptrA.initialize(with: A())
>> ptrA.deinitialize()
>> ptrB = rawPtr.cast(to: UnsafePointer<B>.self)
>> ptrB.initialize(with: B())
>> ```
>> It is hard to spot the difference between the two styles without drawing attention to the unsafe cast.
> 
> - In the same section, in the table, it is not clear whether the
> "tptr.deinitialize" operation un-binds the memory type, or does not
> have effect on it.  Which way is it?  Can I replace
> "tptr.initialize(t2: T)" with "tptr.initialize(u1: U)”?

I reworded this section:

https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsaferawpointer.md#initializing-memory-with-a-typed-pointer-binding-the-type

The simple rule is: if you use a typed pointer to initialize memory, that memory is bound for the duration of its lifetime. The semantics are not temporal.

I think that's very clear in the proposal now.

To fully answer your question “can memory be unbound", technically if you reinitialize the memory using a raw pointer, then accesses on either side of that initialization point are protected from strict aliasing guarantees. But that's just a confusing and mostly useless artifact of the implementation. That's not the memory model as it as specified in the proposal. So forget I said that.

> - Is it valid to access an ARC reference to a class MyClass that
> conforms to MyProtocol with aliasing pointers,
> UnsafeMutablePointer<MyClass>, UnsafeMutablePointer<MyProtocol>, and
> 'UnsafeMutablePointer<AnyObject>' ?  What about
> 'UnsafeMutablePointer<AnyObject?>’ ?

That's discussed in the "Type Safe Memory Access" documentation. I wrote this doc for the sake of discussion, but it’s a little out of date and needs to be updated before putting it up for review again:

https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst

There are two aspects of the type to consider: whether they are related for the purpose of strict aliasing, and whether they are mutually layout compatible.

MyClass and MyProtocol are related, so there's no problem with aliasing. If MyProtocol is an AnyObject existential, then they also have the same representation (layout), so it's safe. If MyProtocol is not class constrained, then they are not layout compatible.

Regarding AnyObject and AnyObject?:

They are related because "one type may be a tuple, enum, or struct that contains the other type as part of its own storage"

They are mutually layout compatible because we know per the ABI that Optional class types have the same representation as class types. It's a bit of a special case. In general, fragile enums with single payloads have some layout guarantees. You can read the payload from the enum type, but can't read the enum from the payload type.

Layout compatible also pertains to the location of references within the type (they need to be ARC-compatible). I have not adequately explained that in the doc but have it on my TODO list.

> - There's no API to convert from UnsafeMutableRawPointer to
> UnsafeMutablePointer<T> without either doing an initialization, or
> binding the type.  Is this on purpose?  The reason why I'm asking is
> that initialization does not seem to be binding the type (I couldn't
> find that in the proposal), but still performs the conversion,
> allowing further code to use typed memory access.  If this allows the
> optimizer to get the desired guarantees about memory, why is binding
> important?  (I'm probably completely confused about this point.)

This is important. I think it is clear now in the proposal (if not, please suggest some better language to use):

https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsaferawpointer.md#initializing-memory-with-a-typed-pointer-binding-the-type

In short, initializing via a raw pointer has different semantics than initializing via a typed pointer (just like other operations on raw pointer have different semantics). Initializing via a raw pointer changes the memory state to "initialized with some type" for the lifetime of that value in memory. Deinitializing the memory then returns it to a pristine state. It does not impose any type on the allocated memory. I propose that this should be the normal, "type safe" way to work with unsafe pointers.

Initializing via a typed pointer, in addition to changing the temporal memory state, also imposes a type on the allocated memory for the entire lifetime of the memory itself, from allocation to deallocation. This is effectively a performance optimization and works well for an important use case (C buffer), but it is less safe, which is why casting a raw to a typed pointer needs to be an explicit cast.

As I keep saying, the type safe way to get a typed pointer is by initializing the raw pointer:

  let ptrToA = rawPtr.initialize(A.self, A())

Explicit pointer casts should only be used for optimizing certain data structures. Unfortunately, interoperability is another reason that developers will need to cast pointers in practice. Some reasoning about type safety is needed in those cases, but at least with this proposal it will be much easier to audit the risky pointer casts.

> - Just wanted to mention that we'd probably need 'raw' variants of
> atomic operations for stdlib-internal use, but you probably already
> noticed that while working on the branch.

Yes, I saw that. Thanks.
-Andy

> 
> Dmitri
> 
> -- 
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/