[swift-evolution] Contiguous Memory and the Effect of Borrowing on Safety

Thu Nov 10 22:42:10 CST 2016

> On Nov 10, 2016, at 5:16 PM, Dave Abrahams <dabrahams at apple.com> wrote:
> on Thu Nov 10 2016, John McCall <rjmccall-AT-apple.com> wrote:
> 
>>> On Nov 10, 2016, at 9:31 AM, Joe Groff <jgroff at apple.com> wrote:
>>>> On Nov 8, 2016, at 9:29 AM, John McCall <rjmccall at apple.com> wrote:
>>>> 
>>>>> On Nov 8, 2016, at 7:44 AM, Joe Groff via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>> On Nov 7, 2016, at 3:55 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org>
>> wrote:
>> 
>>>>>> 
>>>>>> 
>>>>>> on Mon Nov 07 2016, John McCall <swift-evolution at swift.org> wrote:
>>>>>> 
>>>>>>>> On Nov 6, 2016, at 1:20 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Given that we're headed for ABI (and thus stdlib API) stability, I've
>>>>>>>> been giving lots of thought to the bottom layer of our collection
>>>>>>> 
>>>>>>>> abstraction and how it may limit our potential for efficiency.  In
>>>>>>>> particular, I want to keep the door open for optimizations that work on
>>>>>>>> contiguous memory regions.  Every cache-friendly data structure, even if
>>>>>>>> it is not an array, contains contiguous memory regions over which
>>>>>>>> operations can often be vectorized, that should define boundaries for
>>>>>>>> parallelism, etc.  Throughout Cocoa you can find patterns designed to
>>>>>>>> exploit this fact when possible (NSFastEnumeration).  Posix I/O bottoms
>>>>>>>> out in readv/writev, and MPI datatypes essentially boil down to
>>>>>>>> identifying the contiguous parts of data structures.  My point is that
>>>>>>>> this is an important class of optimization, with numerous real-world
>>>>>>>> examples.
>>>>>>>> 
>>>>>>>> If you think about what it means to build APIs for contiguous memory
>>>>>>>> into abstractions like Sequence or Collection, at least without
>>>>>>>> penalizing the lowest-level code, it means exposing UnsafeBufferPointers
>>>>>>>> as a first-class part of the protocols, which is really
>>>>>>>> unappealing... unless you consider that *borrowed* UnsafeBufferPointers
>>>>>>>> can be made safe.  
>>>>>>>> 
>>>>>>>> [Well, it's slightly more complicated than that because
>>>>>>>> UnsafeBufferPointer is designed to bypass bounds checking in release
>>>>>>>> builds, and to ensure safety you'd need a BoundsCheckedBuffer—or
>>>>>>>> something—that checks bounds unconditionally... but] the point remains
>>>>>>>> that
>>>>>>>> 
>>>>>>>> A thing that is unsafe when it's arbitrarily copied can become safe if
>>>>>>>> you ensure that it's only borrowed (in accordance with well-understood
>>>>>>>> lifetime rules).
>>>>>>> 
>>>>>>> UnsafeBufferPointer today is a copyable type.  Having a borrowed value
>>>>>>> doesn't prevent you from making your own copy, which could then escape
>>>>>>> the scope that was guaranteeing safety.
>>>>>>> 
>>>>>>> This is fixable, of course, but it's a more significant change to the
>>>>>>> type and how it would be used.
>>>>>> 
>>>>>> It sounds like you're saying that, to get static safety benefits from
>>>>>> ownership, we'll need a whole parallel universe of safe move-only
>>>>>> types. Seems a cryin' shame.
>>>>> 
>>>>> We've discussed the possibility of types being able to control
>>>>> their "borrowed" representation. Even if this isn't something we
>>>>> generalize, arrays and contiguous buffers might be important
>>>>> enough to the language that your safe BufferPointer could be
>>>>> called 'borrowed ArraySlice<T>', with the owner backreference
>>>>> optimized out of the borrowed representation. Perhaps Array's own
>>>>> borrowed representation would benefit from acting like a slice
>>>>> rather than a whole-buffer borrow too.
>>>> 
>>>> The disadvantage of doing this is that it much more heavily
>>>> penalizes the case where we actually do a copy from a borrowed
>>>> reference — it becomes an actual array copy, not just a reference
>>>> bump.
>>> 
>>> Fair point, though the ArraySlice/Array dichotomy strikes me as
>>> already kind of encouraging this—you might pass ArraySlices down
>>> into your algorithm, but we encourage people to use Array at storage
>>> and API boundaries, forcing copies.
>> 
>> Fair point.  In practice, though, I think most algorithms won't need
>> to "escape" that array slice.
> 
> I disagree. I'm working on some generic matching algorithms (to lay the
> foundation for String search and regexes).  There's going to be a broad
> category of functions in this area that work on Strings and return
> SubStrings, or work on Collections and return slices thereof.  Often
> they'll be called from a context where the resultant slices don't
> outlive the collection, but they still do need to be returned.

Ok.  You're right, I was thinking about arrays more than I was thinking about strings.

Anyway, if you're talking about returning the value back, you're talking about
something we can't support as just a borrowed value without some sort of
lifetime-qualification system.

John.

> 
>>> From a philosophical perspective of making systems Swift feel like
>>> "the same language" as Swift today, it feels better to me to try to
>>> express this as making our high-level safe abstractions efficient
>>> rather than making our low-level unsafe abstractions safe. Given our
>>> short-term goals for the borrow model as I understand them, I don't
>>> think we can really make a BufferPointer-like type safe in the way
>>> Dave is suggesting, since the pointer fields *inside* the struct
>>> need to be first class lifetime-qualified rather than the value of
>>> the struct itself. Since Array and ArraySlice already communicate an
>>> ownership stake in the memory they reference, a borrowed Array or
>>> ArraySlice value *would* safely and efficiently provide access to
>>> contiguous memory with only support for first-order
>>> borrowed/consumed property declarations and not full first class
>>> lifetime support.
>> 
>> I agree.
>> 
>> John.
> 
> -- 
> -Dave