[swift-evolution] [swift-evolution-announce] [Review] SE-0107: UnsafeRawPointer API

Andrew Trick atrick at apple.com
Tue Jul 5 14:06:22 CDT 2016


> On Jul 2, 2016, at 10:10 PM, Brent Royal-Gordon via swift-evolution <swift-evolution at swift.org> wrote:
> 
> More concrete issues:
> 
> * Is there a reason there's a `load` that takes a byte offset, but not a `storeRaw`?

A couple of those methods were added by request, but I was reluctant to add more methods just because they looked like they should be there.
But since you mention it, if we decide keep the purely additive `load(fromByteOffset:as)` method, then I’ll also add this for symmetry:

  func storeRaw<T>(_: T, toByteOffset: Int)

Should it be "toByteOffset" or "atByteOffset”?

https://github.com/apple/swift-evolution/pull/410/files

> * I'm also a little nervous about the fact that `storeRaw` (and `load`?) is documented to only work properly on "trivial types", but it doesn't have any sort of constraints to ensure it's used correctly. (One could imagine, for instance, the compiler automatically conforming trivial types to a `Trivial` protocol.)

Noted. There's absolutely no way to enforce that any overwritten value is trivial, but a sanitizer could catch it. When we have a "trivial" protocol we can add a debugPrecondition on the destination type.

`load` simply does not need to take a trivial type. Although it reads from raw memory, it is not a "raw" operation. It knows how to retain things.

I'd rather not introduce a symmetric `store` that handles nontrivial types, until we really need it, because of the serious potential for misuse. People should try to use typed pointers for assignment semantics.

There's a discussion on this in the proposal now:
https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md#raw-memory-access

> * I don't think I understand `initialize(toContiguous:atIndex:with:)`. Does it return a typed pointer to the whole buffer, or just the one instance it initialized? In the `stringFromBytes` example, shouldn't we either subscript the typed pointer from the previous `initialize(_:with:count:)` call, or call `storeRaw(toContiguous:atIndex:with:)`, rather than initializing memory twice? If this isn't a good use case for `initialize(toContiguous:atIndex:with:)`, what would be?

The latest proposal has this example, which actually ignores the returned value:

  let rawBuffer = UnsafeMutableRawPointer.allocate(bytes: size + 1)
  rawBuffer.initialize(UInt8.self, with: value, count: size)
  rawBuffer.initialize(toContiguous: UInt8.self, atIndex: size, with: 0)

This was requested as a convenience. As mentioned in a previous email, I'm happy to drop it for now.

`initialize(toContiguous:with:count:)` returns a typed pointer to all the initialized elements.

Subscripting the typed pointer to write the null terminator would be wrong because that memory has never been bound to a type.

I do agree that the example should just be:

  let rawBuffer = UnsafeMutableRawPointer.allocate(bytes: size + 1)
  rawBuffer.initialize(UInt8.self, with: value, count: size)
  rawBuffer.initialize(contiguous: UInt8.self, at: size, to: 0)

But the easy, common way to initialize a C string will simply be:

  let cstr = UnsafeMutablePointer<CChar>.allocate(capacity: size + 1)
  // The whole string is now bound to CChar
  for i in 0..<size { cstr[i] = … }
  cstr[size] = 0

> I'm quite concerned by the "moveInitialize should be more elegant" section at the bottom.
> 
> Since the types are so close, `moveInitialize` could require mutating arguments and actually swap the pointers. For instance:
> 
> 	func grow(buffer: UnsafePointer<Int>, count: Int, toNewCapacity capacity: Int) -> UnsafeBuffer<Int> {
> 		var buffer = buffer
> 		var uninitializedBuffer = UnsafeRawPointer.allocate(capacity: capacity, of: Int.self)
> 		
> 		uninitializedBuffer.swapPointersAfterMoving(from: &buffer, count: count)
> 		// `buffer` now points to the new allocation, filled in with the Ints.
> 		// `uninitializedBuffer` now points to the old allocation, which is deinitialized.
> 		
> 		uninitializedBuffer.deallocate()
> 		return buffer
> 	}
> 
> This is *such* a strange semantic, however, that I'm not at all sure how to name this function.
> 
> `moveAssign(from:count:)` could do something much simpler, returning a raw version of `from`:
> 
> 	target.moveAssign(from: source).deallocate()
> 
> `move()`, on the other hand, I don't see a good way to fix like this.
> 
> One ridiculous thing we could do for `moveAssign(from:count:)` and perhaps `move()` is to deliberately make `self` invalid by setting it to address 0. If it were `Optional`, this would nil `self`. If it weren't...well, something would probably fail eventually.

For a while, I was trying to force a convention where deinitialization always returned a raw pointer because it's safer to initialize that
raw pointer. With the latest proposal I'm not as concerned about that. The majority of the time, it will be fine to reinitialize using the typed pointer. If the user wants a raw pointer back after the move, it is trivial just to cast the typed pointer into a raw pointer.

So, while move-semantics would be cool, I really don’t think it’s necessary or even desired in this case. Clear doc comments should be sufficient.

> * * *
> 
> I notice that many APIs require type parameters merely to force the user to explicitly state the types involved. I wonder if we could instead introduce an attribute which you could place on a parameter or return type indicating that there must be an explicit `as` cast specifying its type:
> 
> 	func storeRaw<T>(_: @explicit T)
> 	func load<T>() -> @explicit T
> 	func cast<T>() -> @explicit UnsafePointer<T>
> 	
> 	rawPointer.storeRaw(3 as Int)
> 	rawPointer.load() as Int
> 	rawPointer.cast() as UnsafePointer<Int>
> 
> This would also be useful on `unsafeBitCast`, and on user APIs which are prone to type inference issues.

I'm not as irrated by explicit type arguments as some, but the feeling I get is that we really want a language feature that forces certain
generic paramters to be explicit. When that happens, we'll likely phase out the old-style type arguments in favor of angle brackets, and
I'll be sad because I dislike angle brackets.

> * * * 
> 
> In the long run, however, I wonder if we might end up removing `UnsafeRawPointer`. If `Never` becomes a subtype-of-all-types, then `UnsafePointer<Never>` would gain the basic properties of an `UnsafeRawPointer`:
> 
> * Because `Never` is a subtype of all types, `UnsafePointer<Never>` could alias any other pointer.
> 
> * Accessing `pointee` would be inherently invalid (it would either take or return a `Never`), and APIs which initialize or set `pointee` would be inherently uncallable.
> 
> * `Never` has no intrinsic size, so it could be treated as having a one-byte size, allowing APIs which normally allocate, deallocate, or do pointer arithmetic by instance size to automatically do so by byte size instead.
> 
> * APIs for casting an `UnsafePointer<T>` to `UnsafePointer<supertype of T>` or `<subtype of T>` would do the right thing with `UnsafePointer<Never>`.
> 
> Thus, I could imagine `Unsafe[Mutable]RawPointer` becoming `Unsafe[Mutable]Pointer<Never>` in the future, with some APIs being generalized and moving to all `UnsafePointer`s while others are in extensions on `UnsafePointer where Pointee == Never`.
> 
> It might be worth taking a look at the current API designs and thinking about how they would look in that world:
> 
> * Is `nsStringPtr.casting(to: UnsafePointer<NSObject>)` how you would want to write a pointee upcast? How about `UnsafePointer<NSString>(nsObjectPtr)` for a pointee downcast?
> 
> * Would you want `initialize<T>(_: T.Type, with: T, count: Int = 1) -> UnsafeMutablePointer<T>` in the `Never` extension, or (with a supertype-of-Pointee constraint on `T`) would it be something you'd put on other `UnsafeMutablePointer`s too? What does that mean for `UnsafeMutablePointer.initialize(with:)`?
> 
> * Are `load` or `storeRaw` things that might make sense on any `UnsafeMutablePointer` if they were constrained to supertypes only?
> 
> * Are there APIs which are basically the same on `Unsafe[Mutable]Pointer`s and their `Raw` equivalents, except that the `Raw` versions are "dumb" because they don't know what type they're operating on? If so, should they be given the same name?

Very early on I considered  a special `Never` element type for all of the excellent reasons that you laid out (nice job explaining that), but the pointer conversion rules that we want are not implementable.

Since then, the proposal has evolved so much that it makes sense to have a nominal raw pointer type. The pointer type itself is distinctly different, not just the element type, and the type system needs to be aware of that. It's also critical that the raw and typed pointers have a distinct API. Moving both of their functionality into extensions would just be a workaround. In reality, since the semantics are different, there's almost no shared implementation.

In short, raw pointers are deliberately a different types and we want developers and APIs to be cognizant of that.

-Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160705/d45a37ba/attachment.html>


More information about the swift-evolution mailing list