[swift-evolution] [Pitch] Move UnsafeMutablePointer.initialize(from:) to UnsafeMutableBufferPointer

Ben Cohen ben_cohen at apple.com
Wed Nov 30 18:48:15 CST 2016

Hi swift-evolution,

Below is a draft proposal for a change to facilitate fixing some memory safety issues in the standard library.

If you’re interested in the implementation, PRs can be found here: [stdlib] [WIP] Eliminate version of Array.append(contentsOf:) that takes a Collection <https://github.com/apple/swift/pull/5521> and here: [stdlib] [WIP] Add UnsafeRawBufferPointer.initialize(as:from:) <https://github.com/apple/swift/pull/5718>
(they need a bit of performance work before they’re ready to land)


The version of UnsafeMutablePointer.initialize(from:) that takes a Collection should be deprecated in favour of a new method on UnsafeMutableBufferPointer that takes a Sequence, with a goal of improving memory safety and enabling faster initialization of memory from sequences. Similarly, UnsafeMutableRawPointer.initializeMemory(as:from:) should be deprecated in favour of a new UnsafeMutableRawBufferPointer.initialize(as:from:).


UnsafeMutablePointer.initialize(from:) underpins implementations of collections, such as Array, which are backed by a buffer of contiguous memory. When operations such as Array.append are implemented, they first ensure their backing store can accommodate the number of elements in a source collection, then pass the that collection into the initialize method of their backing store.

Unfortunately there is a major flaw in this design: a collection's count might not accurately reflect the number of elements returned by its iterator. For example, some collections can be misused to return different results on each pass. Or a collection could just be implemented incorrectly.

If the collection's count ends up being lower than the actual number of elements yielded by its iterator, the caller may not allocate enough memory for them. Since UnsafeMutablePointer.initialize(from:) does not receive a limiting capacity, this method would then scribble past the end of the buffer, resulting in undefined behaviour.

Normally when using Unsafe... constructs in Swift the burden of ensuring safety falls on the caller. When using this method with something known to have correct behavior, like an Array, you can do that. But when used in a generic context like Array.append(contentsOf:), where the caller of initialize does not know exactly what kind of collection they are passing in, it is impossible to use this method safely. You can see the impact of this by running the following code. which exhibits memory-unsafe behavior despite only using “safe” constructs from the standard library, something that shouldn’t be possible:

var i = 0
let c = repeatElement(42, count: 10_000).lazy.filter { _ in 
    // capture i and use it to exhibit inconsistent
    // behavior across iteration of c
    i += 1; return i > 10_000 
var a: [Int] = []
// a will allocate insufficient memory before
// calling self._buffer.initialize(from: c)
a.append(contentsOf: c) // memory access violation
While a collection returning an inconsistent count is a programming error (in this case, use of the lazy filter in combination with an logically impure function, breaking value semantics), and it would be reasonable for the standard library to trap under these circumstances, undefined behavior like this is not OK.

In addition, the requirement to pre-allocate enough memory to accommodate from.count elements rules out using this method to initialize memory from a sequence, since sequences don't have a count property (they have an underestimatedCount but this isn't enough since underestimated counts are exactly the problem described above). The proposed solution would allow for this, enabling some internal performance optimizations.

 <https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#proposed-solution>Proposed solution

The existing initialize method should be altered to receive a count of allocated memory to avoid running beyond what the caller has allocated. Given UnsafeMutableBufferPointer already exists to encapsulate "pointer plus a count", the method should be moved to that type and the old method deprecated.

This new method should take a Sequence as its from argument, and handle possible element overflow, returning an Iterator of any elements not written due to a lack of space. It should also return an index into the buffer to indicate where the elements were written up to in cases of underflow.

Once this has been done, the version of Array.append(contentsOf:) that takes a collection can be eliminated, since the performance benefits of the collection version could be incorporated into the implementation of the one that takes a sequence.

The intention of this change is to add memory safety, not to allow the flexibility of collections giving inconsistent counts. Therefore the precondition should remain that the caller should ensure enough memory is allocated to accommodate source.underestedCount elements. The only difference is if they don’t, the behaviour should be well-defined (ideally by trapping, if this can be done efficiently). 


Under-allocating the destination buffer relative to underestimatedCount may trap at runtime. May rather than will because this is an O(n) operation on some collections, so probably should only be enforced in debug builds. But when not enforced, the behavior would still be memory-safe when the precondition is broken.
Over-allocating the destination buffer relative to underestimatedCount is valid and simply results in sequence underflow with potentially uninitialized buffer memory (a likely case with arrays that reserve more than they need).
The source sequence's actual count may exceed both underEstimatedCount and the destination buffer size, resulting in sequence overflow. This is also valid and handled by returning an iterator to the uncopied elements as an overflow sequence.
A matching change should also be made to UnsafeRawBufferPointer.initializeMemory(from:). The one difference is that for convenience this should return an UnsafeMutableBufferPointer of the (typed) intialized elements instead of an index into the raw buffer.

 <https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#detailed-design>Detailed design

The following API changes would be made:

extension UnsafeMutablePointer {
  @available(*, deprecated, message: "it will be removed in Swift 4.0.  Please use 'UnsafeMutableBufferPointer.initialize(from:)' instead")
  public func initialize<C : Collection>(from source: C)
      where C.Iterator.Element == Pointee 

extension UnsafeMutableBufferPointer {
  /// Initializes memory in the buffer with the elements of `source`.
  /// Returns an iterator to any elements of `source` that didn't fit in the 
  /// buffer, and an index to the point in the buffer one past the last element
  /// written (so `startIndex` if no elements written, `endIndex` if the buffer 
  /// was completely filled).
  /// - Precondition: The memory in `self` is uninitialized. The buffer must contain
  ///   sufficient uninitialized memory to accommodate `source.underestimatedCount`.
  /// - Postcondition: The returned iterator
  /// - Postcondition: The `Pointee`s at `self[startIndex..<initializedUpTo]` 
  ///   are initialized.
  public func initialize<S: Sequence>(
    from source: S
  ) -> (unwritten: S.Iterator, initializedUpTo: Index)
    where S.Iterator.Element == Iterator.Element

extension UnsafeMutableRawPointer {
  @available(*, deprecated, message: "it will be removed in Swift 4.0.  Please use 'UnsafeMutableRawBufferPointer.initialize(from:)' instead")
  public func initializeMemory<C : Collection>(
   as: C.Iterator.Element.Type, from source: C
  ) -> UnsafeMutablePointer<C.Iterator.Element>

extension UnsafeMutableRawBufferPointer {
  /// Initializes memory in the buffer with the elements of
  /// `source` and binds the initialized memory to type `T`.
  /// Returns an iterator to any elements of `source` that didn't fit in the 
  /// buffer, and an index into the buffer one past the last byte written.
  /// - Precondition: The memory in `self` is uninitialized or initialized to a
  ///   trivial type.
  /// - Precondition: The buffer must contain sufficient memory to
  ///   accommodate at least `source.underestimatedCount` elements.
  /// - Postcondition: The memory at `self[startIndex..<initialized.count *
  ///   MemoryLayout<S.Iterator.Element>.stride] is bound to type `S.Iterator`.
  /// - Postcondition: The memory at `self[startIndex..<initialized.count *
  ///   MemoryLayout<S.Iterator.Element>.stride] are initialized..
  public func initializeMemory<S: Sequence>(
     as: S.Iterator.Element.Type, from source: S
  ) -> (unwritten: S.Iterator, initialized: UnsafeRawBufferPointer<S.Iterator.Element>)
The += operators and append<C : Collection>(contentsOf newElements: C) methods on Array, ArraySlice and ContiguousArray will be removed as no-longer needed, since the implementation that takes a sequence can be made to be as efficient. They can be replaced by a generic one that calls RangeReplaceableCollection.append(contenstsOf:):

(note, because it forwards on to a protocol requirement, it itself does not need to be a static operator protocol requirement)

/// Appends the elements of a sequence to a range-replaceable collection.
/// Use this operator to append the elements of a sequence to the end of
/// range-replaceable collection with same `Element` type. This example
/// appends the elements of a `Range<Int>` instance to an array of
/// integers.
///     var numbers = [1, 2, 3, 4, 5]
///     numbers += 10...15
///     print(numbers)
///     // Prints "[1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15]"
/// - Parameters:
///   - lhs: The array to append to.
///   - rhs: A collection or finite sequence.
/// - Complexity: O(*n*), where *n* is the length of the resulting array.
public func += <
  R : RangeReplaceableCollection, S : Sequence
>(lhs: inout R, rhs: S) 
 where R.Iterator.Element == S.Iterator.Element
 <https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#source-compatibility>Source compatibility

The addition of the new method does not affect source compatibility. The deprecation of the old method does, but since this is a fundamentally unsound operation that cannot be fixed except via a source-breaking change, it should be aggressively deprecated and then removed.

The knock-on ability to remove the version of Array.append(contentsOf:) that takes a collection does not affect source compatability since the version for sequences will be called for collections instead.

 <https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#effect-on-abi-stability>Effect on ABI stability

This change must be made prior to declaring ABI stability, since it is currently called from the Array.append method, which is inlineable.

 <https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#alternatives-considered>Alternatives considered

Overflow (elements remain on the returned iterator) and underflow (initializedUpTo != endIndex) are almost but not quite mutually exclusive – if the buffer is exactly used, the caller must call .next() to check for any unwritten elements, which means the returned value must be declared var, and the check can't be chained. This is a little ugly, but is the unavoidable consequence of how iterators work: since iterating is consuming, the initialize method cannot easily test for this and indicate it back to the caller in some other way (such as returning an optional iterator).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20161130/0802e7e4/attachment.html>

More information about the swift-evolution mailing list