[swift-evolution] [RFC] UnsafeBytePointer API for In-Memory Layout

Geordie J geojay at gmail.com
Mon May 9 16:43:41 CDT 2016


> Am 09.05.2016 um 23:04 schrieb Andrew Trick <atrick at apple.com>:
> 
> 
>> On May 9, 2016, at 12:38 PM, Geordie Jay <geojay at gmail.com <mailto:geojay at gmail.com>> wrote:
>> 
>> I read this proposal and I'm a bit unsure what its purpose would be:
>> 
>> Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>) conversions and/or vice-versa? And you'd achieve this by replacing UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?
> 
> I want to prevent UnsafePointer<U>(UnsafePointer<T>) *except* when the destination is UnsafePointer<Void>.
> 
> UnsafePointer<Void>(UnsafePointer<T>) is fine.
> 
> UnsafeBytePointer provides two thing:
> - A means to prevent the conversion above
> - An API for legal type punning, which does not exist today

So you mean to enable UnsafePointer<Void> aka. UnsafeBytePointer(UnsafePointer<T>), but disable other type-to-type pointer recasts? I guess that’s a worthy goal at some level, but is there anything stopping someone just saying UnsafePointer(UnsafeBytePointer(myPointerToMemoryContainingTypeT), toPointee: U.type)?

It still just seems like we can do the same thing spelled differently. I don’t see how changing how that happens could benefit us or the compiler, but maybe this is one we should just take your word on.

Assuming the likely case that this is just beyond my understanding, I do wonder why we’d need to change the API. I guess there are a lot of assumptions made about both UnsafePointer<Void> and UnsafePointer<T> that don’t necessarily apply to both to an equal degree?

> 
>> In one sense the change seems fine to me, but as someone who uses a lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see what benefit it'd bring. Presumably we'd still want an option of converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C function pointer callback "context"/"userInfo" uses, so it's not like we'd be preventing programmer error in that way.
> 
> It’s possible to cast UnsafeBytePointer to UnsafePointer<SomeActualType>. I want the programmer to make their intent explicit  by writing a cast and spelling SomeActualType at the point of the cast. In the proposal, that’s done using a labeled initializer.

How is this different from what we do now, namely UnsafePointer<SomeActualType>(myUnsafePointer) <— I’m also spelling out SomeActualType there. I think I’m still misunderstanding something critical here.

From your email that just came in:

> if converting UMP types leads to undefined behavior, then it should be prohibited in the API, unless the programming explicitly requests the conversion


This is the point I’d really like to try and understand: can you clarify how the new API is any more or less explicit than the old one?

> 
>> Call me conservative but to me the current system seems to work as well as it can. If anything it's already enough boilerplate going through hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I know and the C API knows perfectly well what it actually contains... Would happily be convinced otherwise about this proposal though, I'm pretty new at all this.
> 
> I think you are asking for implicit conversions when calling C APIs. That’s good feedback. When implementing this proposal I tried to allow implicit conversions in reasonable cases, but leaned toward being conservative. I would rather see more explicit casts now and eliminate them if people find it awkward.

Maybe, but I’m not sure how that’d look under this proposal. I mean Strings and literals currently being accepted as UnsafePointer<CChar> is a nice touch, and last I checked I can use [T, T, T, ...] array literals in place of UnsafePointer<T>, I certainly wouldn’t want to go below that level of conservatism here.

> 
> I'm looking for some consensus on core aspects of the proposal, then we can take into consideration precisely which implicit conversions should be supported.
> 
> -Andy
> 
>> Geordie
>> Andrew Trick via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> schrieb am Mo., 9. Mai 2016 um 20:15:
>> Hello Swift evolution,
>> 
>> I sent this to swift-dev last week. Sorry to post on two lists!
>> 
>> Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.
>> 
>> For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.
>> 
>> This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.
>> 
>> -Andy
>> 
>> 
>> UnsafeBytePointer API for In-Memory Layout
>> 
>> Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
>> Author(s): Andrew Trick <https://github.com/atrick>
>> Status: Awaiting review <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#rationale>
>> Review manager: TBD
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#introduction>Introduction
>> 
>> UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.
>> 
>> In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#motivation>Motivation
>> 
>> To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).
>> 
>> To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.
>> 
>> As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.
>> 
>> An UnsafeBytePointer example, using a new proposed API is included below.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#proposed-solution>Proposed solution
>> 
>> Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.
>> 
>> Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.
>> 
>> UnsafeBytePointer meets multiple requirements:
>> 
>> An untyped pointer to memory
>> Pointer arithmetic within byte-addressable memory
>> Type-unsafe access to memory (legal type punning)
>> UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.
>> 
>> Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.
>> 
>> In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#detailed-design>Detailed design
>> 
>> The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>.
>> 
>> struct UnsafeBytePointer : Hashable, _Pointer {
>> 
>>   let _rawValue: Builtin.RawPointer
>> 
>>   var hashValue: Int {...}
>> 
>>   init<T>(_ : UnsafePointer<T>)
>>   init<T>(_ : UnsafeMutablePointer<T>)
>>   init?<T>(_ : UnsafePointer<T>?)
>>   init?<T>(_ : UnsafeMutablePointer<T>?)
>> 
>>   init<T>(_ : OpaquePointer<T>)
>>   init?<T>(_ : OpaquePointer<T>?)
>> 
>>   init?(bitPattern: Int)
>>   init?(bitPattern: UInt)
>> 
>>   func load<T>(_ : T.Type) -> T
>> 
>>   @warn_unused_result
>>   init(allocatingBytes size: Int, alignedTo: Int)
>> 
>>   @warn_unused_result
>>   init<T>(allocatingCapacity count: Int, of: T.Type)
>> 
>>   func deallocateBytes(_ size: Int, alignedTo: Int)
>> 
>>   func deallocateCapacity<T>(_ num: Int, of: T.Type)
>> 
>>   // Returns a pointer one byte after the initialized memory.
>>   func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer
>> 
>>   // Returns a pointer one byte after the initialized memory.
>>   func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer
>> 
>>   func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)
>> 
>>   func deinitialize<T>(_ : T.Type, count: Int = 1)
>> }
>> 
>> extension OpaquePointer {
>>   init(_ : UnsafeBytePointer)
>> }
>> 
>> extension Int {
>>   init(bitPattern: UnsafeBytePointer)
>> }
>> 
>> extension UInt {
>>   init(bitPattern: UnsafeBytePointer)
>> }
>> 
>> extension UnsafeBytePointer : RandomAccessIndex {
>>   typealias Distance = Int
>> 
>>   func successor() -> UnsafeBytePointer
>>   func predecessor() -> UnsafeBytePointer
>>   func distance(to : UnsafeBytePointer) -> Int
>>   func advanced(by : Int) -> UnsafeBytePointer
>> }
>> 
>> func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool
>> 
>> func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool
>> 
>> func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer
>> 
>> func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer
>> 
>> func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer
>> 
>> func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int
>> 
>> func += (lhs: inout UnsafeBytePointer, rhs: Int)
>> 
>> func -= (lhs: inout UnsafeBytePointer, rhs: Int)
>> Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:
>> 
>> extension UnsafePointer {
>>   init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
>> }
>> extension UnsafeMutablePointer {
>>   init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
>> }
>> Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:
>> 
>> extension UnsafePointer {
>>   init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
>>   init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
>> }
>> extension UnsafeMutablePointer {
>>   init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
>> }
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#impact-on-existing-code>Impact on existing code
>> 
>> The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.
>> 
>> Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.
>> 
>> Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.
>> 
>> All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.
>> 
>> Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.
>> 
>> StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.
>> 
>> The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.
>> 
>> CoreAudio utilities now use an UnsafeBytePointer.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#implementation-status>Implementation status
>> 
>> On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.
>> 
>> There are a several things going on here in order to make it possible to build the standard library with the changes:
>> 
>> A new UnsafeBytePointer type is defined.
>> 
>> The type system imports void* as UnsafeBytePointer.
>> 
>> The type system handles implicit conversions to UnsafeBytePointer.
>> 
>> UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.
>> 
>> The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.
>> 
>> Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.
>> 
>> Explicit labeled Unsafe[Mutable]Pointer initializers are added.
>> 
>> The inferred Unsafe[Mutable]Pointer conversion is removed.
>> 
>> TODO:
>> 
>> Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.
>> 
>> Additionally:
>> 
>> A name mangled abbreviation needs to be created for UnsafeBytePointer.
>> 
>> The StringAPI tests should probably be rewritten with UnsafeBytePointer.
>> 
>> The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer
>> 
>> The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternatives-considered>Alternatives considered
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#existing-workaround>Existing workaround
>> 
>> In some cases, developers can safely reinterpret values to achieve the same effect as type punning:
>> 
>> let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
>> ptrI32[0] = Int32()
>> let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
>> Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#discarded-alternatives>Discarded alternatives
>> 
>> We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.
>> 
>> The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.
>> 
>> To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.
>> 
>>  <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternate-proposal-for-void-type>Alternate proposal for void* type
>> 
>> Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.
>> 
>> We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.
>> 
>> This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.
>> 
>> Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160509/aac8e996/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160509/aac8e996/attachment.sig>


More information about the swift-evolution mailing list