[swift-evolution] RFC: Proposed rewrite of Unmanaged<T>

Mon Feb 22 08:52:36 CST 2016

> On 21 Feb 2016, at 12:28, Dave Abrahams <dabrahams at apple.com> wrote:
> 
> 
> Back to this after a long hiatus, sorry.

No problem, you've got bigger fish to fry. :-)

And thanks again for your time and questions. Having someone ask
the hard questions really helps me reevaluate and better understand
my own thoughts and opinions.

> on Tue Dec 29 2015, Janosch Hildebrand <jnosh-AT-jnosh.com> wrote:
> 
>>> On 19 Dec 2015, at 22:09, Dave Abrahams <dabrahams at apple.com> wrote:
>>> 
>>> 
>>>> On Dec 18, 2015, at 6:18 PM, Janosch Hildebrand via swift-evolution
>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>>
>>>> wrote:
>>>> ...
>>> 
>>> I don't see any point in having "manuallyRelease()" if we already
>>> have "release()"; they'd do the same thing if you dropped the return
>>> value.
>> 
>> I like the proposed idea of having two separate types for handling
>> unannotated CF APIs and MRC which would also nicely resolve this
>> issue.
>> 
>> Would a separate type for MRC also fall under this proposal or would
>> that require a separate proposal?
> 
> Considering that we're just RFC'ing here, we can certainly talk about
> that.
> 
>> And speaking of a separate type for MRC, how about `ManagedReference`
>> as a name? Seems much better than `Unmanaged`, nicely contrasts with
>> `UnsafeReference` and `ManuallyManagedReference` is a bit of a
>> mouthful...
> 
> I think we want “managed” to mean “managed for you,” not “managed by
> you.”  It's also quite unsafe because you can overrelease it, etc., so
> it would have to have “unsafe” in the name somewhere I think.

That makes sense. Something like `UnsafeReferenceCountedPointer` but
shorter?

>>>> I don't think this use case even needs to be described in the
>>>> documentation for `UnsafeReference` and it's fine if its use is
>>>> very much discouraged.
>>>> 
>>>> Personally I prefer the proposed
>>>> `manuallyRetain()`/`manuallyRelease()` over plain
>>>> `retain()`/`release()` as it clearly separates the returning and
>>>> more generally applicable `release()` from the MRC
>>>> methods. `retain()` would probably also have to return the object
>>>> which would interfere with the max safe usage pattern.
>>> 
>>> I don't understand your last sentence; care to clarify?
>> 
>> My main reason for preferring `manuallyRetain()`/`manuallyRelease()`
>> over `retain()`/`release()` would be that the former would *not*
>> return the object, thus more cleanly separating them from the current
>> `release()` which returns the object to be used from now on, with the
>> `UnsafeReference` to be discarded at that point.
>> 
>> I just think it might be more confusing to also use `release()` for
>> MRC and also introducing `retain()` would only exacerbate the
>> issue. For symmetry reasons `retain()` would likely also return the
>> object. 
> 
> There might be other reasons to do it, but I don't think symmetry is
> necessarily a design goal here.
> 
>> That would make it very similar to `release()` and `.object` which it
>> really shouldn't be as it shouldn't ever be used for handling object
>> from unannotated CF APIs.
>> 
>> I think having a third method/property with a very similar signature
>> would likely confusion regarding the "Maximally Safe Usage" pattern
>> you described.
>> 
>> But as mentioned above I would actually prefer having two separate
>> types which would also make this a non-issue.
> 
> Questions:
> 
> 1. How would these types interact?  Does one need to be able to convert
>   between them liberally, or is it sufficient to use strong references
>   as the common currency?

If we were to have two separate types I think it would be more than fine to
use strong references as a go-between. The use cases for the two types are
so different that I doubt it would be an issue.

> 2. Do you really want a type at all?  Why not just retain() and
>   release() as free functions?

I assume these would be unsafeRetain() and unsafeRelease() ;-)

But yeah, that would work as well and might be a nicer solution overall.
(And you could easily create your own type from these + unowned(unsafe))

The downside I see is that being free functions and working with
AnyObject makes them much more discoverable than being hidden
inside some other type. But given the other unsafe* free functions
that's already exists that might be fine.

Also having a predefined wrapper type has some use, e.g. when you
want to store your unowned(unsafe) objects into some collection.
But I'd guess you'll end up with a dedicated wrapper type anyway
in most circumstances where this is an issue so it's probably OK.
And this gets rid of having a separate type that also plays the role
of unowned(unsafe) which seems like a plus.

I have some more answers below but I'll summarize my opinion here.
Preferences (in descending order):

1) unsafeRetain() + unsafeRelease() + unowned(unsafe)
2) "UnsafeReferenceCountedPointer"
3) No dedicated functionality so just abuse UnsafeReference instead of
    Unmanaged

I think ultimately it's a question of whether we want to expose the
reference counting implementation to manual use...

Is it something I can live without? Absolutely.
Is it something I would use often? Absolutely not.
But then again, it is going to be exposed through UnsafeReference anyway.

And having access to a solid manual reference counting solution
that integrates very well with the rest of the language is kinda neat in
my opinion. And the integration with ARC is something an external
library cannot provide without becoming even more "hacky".
And I don't think it's any more dangerous than any other manual
memory management we have access to. 

I also wonder if it has some minor use for learning/teaching.
Yes, ARC is great because most of the time you don't need to think
about this but if you're trying to understand reference counting it's
kinda nice to be able to actually interact and play around with it.
Then again I'm the kind of person that likes doing that but YMMV...

>>>> As Joe mentioned, `Unmanaged` has a use for manual ref counting
>>>> beyond immediate transfer from un-annotated APIs.
>>>> 
>>>> I have used it for performance reasons myself (~ twice) and while I
>>>> think it's a pretty small use case there isn't really any
>>>> alternative.
>>>> If it would help I can also describe my use-cases in more detail.
>>> 
>>> Yes please!
>> 
>> One place I used Unmanaged is in a small project where I experiment
>> with binary heaps in Swift. I've put the project on Github
>> --(https://github.com/Jnosh/SwiftBinaryHeapExperiments) but basically
>> I'm using `Unmanaged` in two places here:
>> 
>> 1) Testing the 'overhead' of (A)RC.
>> Basically comparing the performance of using ARC-managed objects in
>> the heaps vs. using 'unmanaged' objects. In Swift 1.2 the difference
>> was still ~2x but with Swift 2+ it's likely approaching the cost of
>> the retain/release when entering and exiting the collection.
>> 
>> Now this could also be accomplished using `unowned(unsafe)` but
>> `Unmanaged` has some minor advantages:
>> 	a) I can keep the objects alive without keeping them in a
>> separate collection. Not a big issue here since I'm doing that anyway
>> but I also find that `Unmanaged` makes it clearer that & how the
>> objects are (partly) manually managed.
>> 	b) I had previously experimented with using `unowned(unsafe)`
>> for this purpose but found that `Unmanaged` performed better. However,
>> that was in a more complex example and in the Swift 1.2 era. A quick
>> test indicates that in this case and with Swift 2.1 `unowned(unsafe)`
>> and `Unmanaged` perform about equally.
> 
> They should.  unowned(unsafe) var T is essentially just an
> UnsafePointer.  unowned/unowned(safe) do incur reference-counting cost
> in exchange for their safety.

I'll come back to this further down.

>> 2) A (object only) binary heap that uses `Unmanaged` internally
>> Not much practical use either in this case since the compiler seems to
>> do quite well by itself but still a somewhat interesting exercise.
>> `Unmanaged` is pretty much required here to make CoW work by manually
>> retaining the objects.
> 
> It's hard for me to imagine why that would be the case.  Would I have
> needed to use Unmanaged in implementing Arrays of objects, if it were?

Sorry, I wasn't clear enough. I (ab)use Unmanaged for two different reasons here.

1) To have a performance baseline where the ARC overhead inside the collection
is essentially zero beyond the mandatory retain on insert, i.e. as if the compiler was
able to eliminate all (redundant) retains and releases.

One part of this is exempting the objects from ARC which is is done by storing the
elements in Unmanaged instances but a wrapper type using unowned(unsafe)
would work just as well.

However, I still need a strong reference to the objects to keep them alive. Using a
separate data structure would work but that has a space, time and code complexity
cost.
Instead I use Unmanaged to manually retain the objects on insert and release on
removal. unowned cannot do that on its own hence the need for something like
unsafeRetain() & unsafeRelease().

2) I then abuse Unmanaged's capabilities a second time to retain the elements
when the collection is copied (which would happen 'automatically' with ARC).

Btw, with Swift 2.2 under WMO the performance of a normal ManagedBuffer
is on par with this "hack". Go Swift team!

>> The other project was a simple 2D sprite engine (think a simplified
>> version of SpriteKit) I experimented with about a year ago.
>> Textures and Shaders were abstracted as value types privately backed
>> by reference types that managed the underlying OpenGL objects,
>> i.e. destroy the OpenGL texture object on deinit, etc...
>> 
>> I found this to be quite nice to use but ARC overhead during batching
>> & rendering amounted to something like 20-30% of CPU time IIRC. (This
>> was under Swift 1.2 and with WMO). Using `Unmanaged` was one of the
>> things I played around with to get around this and it worked very
>> well.
> 
> Another case where you can use unowned(unsafe), is it not?

Indeed, and that was what I originally tried to use.
Ultimately i settled on Unmanaged however. Now it's been a long time and
I don't recall the exact details so take this with a grain of salt: 

One reason certainly was that I ended up needing Unmanaged anyway
to perform manual retain & releases at which point why not also use it
for 'storage'...

But I also vaguely recall that Unmanaged had more of a performance impact.
Now one possibility is that there was some issue with unowned(unsafe) (this
was with 1.2β1) but much more likely is that Unmanaged was easier to apply
consistently and correctly.
e.g. assume you have some struct that contains an unowned(unsafe) variable.
Now if you extract that into a local variable you add a perhaps unwanted
retain/release so you might need to mark the local variable as unowned(unsafe)
as well, etc...
What I'm trying to say is that with unowned you need to be careful and considerate
with how you use it at all times since the 'obvious' thing generally leads to
retain/release.
Unmanaged is fine to pass around, store in a local variable, etc... and any ARC
related interactions are obvious because they manifest as method calls on the
Unmanaged instance.

For their main application, breaking retain cycles, weak and unowned work fine
because you want to retain the objects when they are not 'at rest'.
But if you want to avoid retains even when working with the objects, a type is
just a much more comfortable way to handle this.
Like I mentioned before, I imagine that in many cases you'll end up making a
custom wrapper anyway but it's something I'm a bit apprehensive about.

Still, I think unowned(unsafe) together with unsafeRetain() and unsafeRelease()
free functions makes for a nicer API and I don't think I can adequately judge it
beyond that. I've barely used this in it's current form have no real experience
with a potential future form and I hopefully won't use it (often) anyway.
So I think it's more than appropriate to prioritize the general API over making
this esoteric use case more comfortable to use.

>> The `Unmanaged` instances were created when draw commands are
>> submitted to the renderer so they were only used inside the rendering
>> pipeline.
>> I eventually switched to using the OpenGL names (i.e. UInts) directly
>> inside the renderer since they are already available anyway but that
>> also requires extra logic to ensure the resources are not destroyed
>> prematurely (e.g. retaining the object until the end of the frame or
>> delaying the cleanup of the OpenGL resources until the end of the
>> frame, ...). In many ways it's quite a bit messier than just using
>> `Unmanaged`.
> 
> I don't see how Unmanaged could have been less messy; don't you still
> need a strong reference somewhere to ensure the lifetime?

Absolutely. You can retain the object directly through Unmanaged
(via passRetained() or retain()) and make the Unmanaged instance
a strong reference in effect.

Not a big difference to retaining by putting the objects in some container.
Just a different set of tradeoffs.

I don't think it's the best solution for this case but it's pretty simple. Retain
when creating the draw command, release when discarding the draw
command - nothing different than malloc/free.

Collecting the objects in some collection is likely to be a cleaner solution
and more efficient too, since you don't retain objects multiple times if they are
used multiple times in the same frame (which is likely for shaders, textures).
But then someone somewhere needs to manage this, and you need to
access that state when creating or submitting the draw command.

Or perhaps make sure the referenced resources stay valid until the frame
is drawn so you don't need to retain here at all but now you need to track
all the scene contents, etc...

Hopefully I don't come across as too petulant. :-)
I don't really want to argue in favor of or defend these approaches.
I'm merely trying to give some examples of when and what for I actually
used this stuff not to prove the merits of these cases but instead to argue
for the existence of better justified uses based on the same ideas.
Not sure if that makes any sense but there you go :-)

>> I don't think these are particularly great examples and I could
>> certainly live without 'native' MRC but ultimately I think it's an
>> interesting capability so I'd like to keep it around. 
>> Although I'd be in favor of keeping it out of the stdlib but I don't
>> think that's really an option just yet...
>> 
>> It would also be interesting to be able to do the same with indirect
>> enum instances and closures but it's not like I have a particular use
>> case for that ;-)
> 
> I don't understand what you might be hinting at here.

Just that AFAIK closures and indirect enum instances also use ARCed
references under the hood. So in theory the could potentially also be
stored unowned and manually retained/released.

I just find it slightly interesting that with (Any)Objects certain things are
exposed (unsafeAddressOf, retain/release, ...) whereas with indirect enums
and closures they are not.

I don't want to imply that that would be a good idea and it would certainly
be hard, complicated, and annoying to implement with essentially n
benefit so I don't want to go anywhere with this other than the partial similarity.

Basically it's just my brain going:
"Oh look, some pyramids. Hmm, you could store these much more efficiently
if you stacked them up against each other" ;-)

- Janosch

> -- 
> -Dave