[swift-evolution] [Proposal] Property behaviors

Thu Dec 17 17:44:41 CST 2015

On Thu, Dec 17, 2015, at 09:37 AM, Joe Groff via swift-evolution wrote:
> Hi everyone. Chris stole my thunder already—yeah, I've been working on
> a design for allowing properties to be extended with user-defined
> delegates^W behaviors. Here's a draft proposal that I'd like to open
> up for broader discussion. Thanks for taking a look!

Thanks for posting this! I just read through it, and there's a lot to
like in here, but I also have a bunch of concerns. I'll go back through
the document in order and respond to bits of it.

I apologize in advance for the massive size of this email, and for its
rambling nature. It is a bit of stream-of-consciousness. I also
apologize if anything in here has already been addressed in this thread,
as I've been writing it over several hours and I know the thread has had
discussion during that time.

> A var or let declaration can specify its behavior in parens after
> the keyword

I like this syntax.

> Furthermore, the behavior can provide additional operations, such
> as clear-ing a lazy property, by accessing it with
> property.behavior syntax:

You already mentioned this at the end, but I'm concerned about the
ambiguity between `foo.behavior` and `foo.someProp`. If the compiler
always resolves ambiguity in one way, that makes it impossible to
explicitly choose the alternative resolution (e.g. if `foo.lazy`
resolves in favor of a property of the type, how do you access the lazy
behavior? If it resolves in favor of the behavior, how do you get at the
property instead?). Not just that, but it's also ambiguous to any
reader; if I see `self.foo.bar` I have to know up-front whether "bar" is
a behavior or a property of the variable's type.

I'm mildly tempted to say we should use

`foo.lazy`.reset()

but I admit it does look a bit odd, especially if accessing methods of
behaviors ends up being common. Another idea might look like

foo.(lazy).reset()

Or maybe we could even come up with a syntax that lets you omit the
behavior name if it's unambiguous (e.g. only one behavior, or if the
method/property you're accessing only exists on one behavior). Being
able to omit the behavior name would be nice for defining resettable
properties because saying something like `foo.resettable.reset()` is
annoyingly redundant. Maybe something like `foo::reset()` or
`foo#reset()`, which would be shorthand for`foo::lazy.reset()` or
`foo#lazy.reset()`.

> public subscript<Container>(varIn _: Container,
> initializer initial: () -> Value) -> Value {

I'm a bit concerned about passing in the Container like this. For class
types it's probably fine, but for value types, it means we're passing a
copy of the value in to the property, which just seems really weird
(both because it's a copy, and because that copy includes a copy of the
property).

Also the only example you gave that actually uses the container is
Synchronized, but even there it's not great, because it means every
synchronized property in the class all share the same lock. But that's
not how Obj-C atomic properties work, and there's really no benefit at
all to locking the entire class when accessing a single property because
it doesn't provide any guarantees about access to multiple properties
(as the lock is unlocked in between each access).

FWIW, the way Obj-C atomic properties work is for scalars it uses atomic
unordered loads/stores (which is even weaker than memory_order_relaxed,
all it guarantees is that every load sees a value that was written at
some point, i.e. no half-written values). For scalars it calls functions
objc_copyStruct(), which uses a bank of 128 spinlocks and picks two of
them based on the hash of the src/dst addresses (there's a comment
saying the API was designed wrong, hence the need for 2 spinlocks;
ideally it would only use one lock based on the address of the property
because the other address is a local stack value). For objects it calls
objc_getProperty() / objc_setProperty() which uses a separate bank of
128 spinlocks (and picks one based on the address of the ivar). The
getter retains the object with the spinlock held and then autoreleases
it outside of the spinlock. The setter just uses the spinlock to protect
writing to the ivar, doing any retains/releases outside of it. I haven't
tested but it appears that Obj-C++ properties containing C++ objects
uses yet another bank of 128 spinlocks, using the spinlock around the
C++ copy operation.

Ultimately, the point here is that the only interesting synchronization
that can be done at the property level is unordered atomic access, and
for any properties that can't actually use an atomic load/store (either
because they're aggregates or because they're reference-counted objects)
you really do want to use a spinlock to minimize the cost. But adding a
spinlock to every single property is a lot of wasted space (especially
because safe spinlocks on iOS require a full word), which is why the Obj-
C runtime uses those banks of spinlocks.

In any case, I guess what I'm saying is we should ditch the Container
argument. It's basically only usable for classes, and even then it's
kind of strange for a property to actually care about its container.

> var `foo.lazy` = lazy(var: Int.self, initializer: { 1738 })

This actually won't work to replace existing lazy properties. It's legal
today to write

lazy var x: Int = self.y + 1

This works because the initializer expression isn't actually run until
the property is accessed. But if the initializer is passed to the
behavior function, then it can't possibly reference `self` as that runs
before stage-1 initialization.

So we need some way to distinguish behaviors that initialize immediately
vs behaviors that initialize later. The former want an initializer on
the behavior function, and may or may not care about having an
initializer on the getter/setter. The latter don't want an initializer
on the behavior, and do want one on the getter/setter. In theory you
could use the presence of a declared `initializer` argument on the
behavior function to distinguish between eager-initialized and lazy-
initialized, though that feels a little odd.

> let (memoized) address = "\(street)\n\(city) \(postalCode)"

You're using un-qualified accesses to properties on self in the
initializer here. I'm not actually against allowing that, but `lazy`
properties today require you to use `self.`, otherwise any unqualified
property access is resolved against the type instead of the value. I
believe the current behavior is because non-lazy properties resolve
unqualified properties this way, so `lazy` properties do too in order to
allow you to add `lazy` to any property without breaking the existing
initializer.

This property declaration also runs into the eager-vs-delayed
initializer issue I mentioned above.

> A property behavior can model "delayed" initialization behavior, where
> the DI rules for var and let properties are enforced dynamically
> rather than at compile time

It looks to me that the only benefit this has versus IOUs is you can use
a `let` instead of a `var`. It's worth pointing out that this actually
doesn't even replace IOUs for @IBOutlets because it's commonly useful to
use optional-chaining on outlets for code that might run before the view
is loaded (and while optional chaining is possible with behavior access,
it's a lot more awkward).

> let (delayed) x: Int ... self.x.delayed.initialize(x) ...

Allowing `let` here is actually a violation of Swift's otherwise-strict
rules about `let`. Specifically, Delayed here is a struct, but
initializing it requires it to be mutable. So `let (delayed) x: Int`
can't actually ever be initialized. You could make it a class, but
that's a fairly absurd performance penalty for something that provides
basically the same behavior as IOUs. You do remark later in detailed
design about how the backing storage is always `var`, which solves this
at a technical level, but it still appears to the user as though they're
mutating a `let` property and that's strictly illegal today.

I think the right resolution here is just to remove the `letIn`
constructor and use `var` for these properties. The behavior itself
(e.g. delayed) can document write-once behavior if it wants to. Heck,
that behavior was only enforcing write-once in a custom initialize()
method anyway, so nothing about the API would actually change.

> Resettable properties

The implementation here is a bit weird. If the property is nil, it
invokes the initializer expression, every single time it's accessed. And
the underlying value is optional. This is really implemented basically
like a lazy property that doesn't automatically initialize itself.

Instead I'd expect a resettable property to have eager-initialization,
and to just eagerly re-initialize the property whenever it's reset. This
way the underlying storage isn't Optional, the initializer expression is
invoked at more predictable times, and it only invokes the initializer
once per reset.

The problem with this change is the initializer expression needs to be
provided to the behavior when reset() is invoked rather than when the
getter/setter is called.

> NSCopying

We really need to support composition. Adding NSCopying to a property
doesn't really change the behavior of the property itself, it just makes
assignments to it automatically call copy() on the new value before
doing the actual assignment. Composition in general is good, but
NSCopying seems like an especially good example of where adding this
kind of behavior should work fine with everything else.

Based on the examples given here, there's really several different
things behaviors do:

* Behaviors that "decorate" the getter/setter, without actually changing
  the underlying value get/set. This includes property observers and
  Synchronized (although atomic properties ideally should alter the
  get/set to use atomic instructions when possible, but semantically
  it's the same as taking a per-property spinlock).
* Behaviors that transform the  value. This is basically NSCopying,
  because it copies the value but otherwise wants to preserve any
  existing property behavior (just with the new value instead of the
  old). But e.g. lazy can also be thought of as doing this where the
  transform is from T to T? (the setter converts T into T? and assigns
  it to the underlying value; the getter unwraps the T? or initializes
  it if nil and returns T). Of course there is probably a difference
  between transformers that keep the same type and ones that change the
  type; e.g. property observers with NSCopying may want to invoke
  willSet with the initial uncopied value (in case the observer wants to
  change the assigned value), but didSet should of course be invoked
  with the resulting copied value. But transformers where the
  transformation is an implementation detail (such as lazy, which
  transforms T to T?) don't want to expose that implementation detail to
  the property observers. So maybe there's two types of transformers;
  one that changes the underlying type, and one that doesn't.
* Behaviors that don't alter the getter/setter but simply provide
  additional functionality. This is exemplified by Resettable (at least,
  with my suggested change to make it eagerly initialize), because it
  really just provides a .reset() function.
* The lazy vs eager initialized thing from before

I suspect that we really should have a behavior definition that
acknowledges these differences and makes them explicit in the API.

There's also a lot of composition concerns here. For example,
synchronized should probably always be the innermost decorator, because
the lock is really only protecting the storage of the value and
shouldn't e.g. cover property observers or NSCopying. Property observers
should probably always be the outermost decorator (and willSet should
even fire before NSCopying, partially because it should be whatever
value the user actually tried to assign, and because willSet observers
can actually change the value being assigned and any such new value
should then get copied by NSCopying).

Speaking of composition, mixing lazy and synchronized seems really
problematic. If Synchronized uses a bank of locks like the obj-c
runtime, then lazy can't execute inside of the lock because the
initializer might access something else that hits the same lock and
causes an unpredictable deadlock. But it can't execute outside of the
lock either because the initializer might then get executed twice
(which would surprise everyone). So really the combination of lazy +
synchronized needs to actually use completely separate combined
LazySynchronized type, one that provides the expected dispatch_once-
like behavior.

> Referencing Properties with Pointers ... A production-quality stdlib
> implementation could use compiler magic to ensure the property is
> stored in-line in an addressable way.

Sounds like basically an implementation that just stores the value
inline as a value and uses Builtin.addressOf(). This behavior is
problematic for composition. It also doesn't work at all for computed
properties (although any behavior that directly controls value storage,
such as lazy, also has the same limitation). The behavior design should
acknowledge the split between behaviors that work on computed properties
and those that don't.

More thoughts on composition: The "obvious" way to compose behaviors is
to just have a chain of them where each behavior wraps the next one,
e.g. Copying<Resettable<Synchronized<NSString>>>. But this doesn't
actually work for properties like Lazy that change the type of the
underlying value, because the "underlying value" in this case is the
wrapped behavior, and you can't have a nil behavior (it would break most
of the functionality of behaviors, as well as break the ability to say
`foo.behavior.bar()`).

Based on the previous behavior categories, I'm tempted to say that we
need to model behaviors with a handful of protocols (e.g. on for
decorators, one for transformers, etc), and have the logic of the
property itself call the appropriate methods on the collection of
protocols at the appropriate times. Transformer behaviors could have an
associated type that is the transformed value type (and the behavior
itself would be generic, taking the value type as its parameter, as you
already have). The compiler can then calculate the ordering of
behaviors, and use the associated types to figure out the "real"
underlying value, and pass appropriately-transformed value types to the
various behaviors depending on where in the chain they execute. By that
I mean a chain of (observed, lazy, sync) for a property of type Int
(ignoring for a moment the issues with sync + lazy) would create an
Observed<Int>, a Lazy<Int>, and a Sync<Int?> (because the Lazy<Int>'s
associated type says it transforms to Int?). The problem with this model
is the behavior can no longer actually contain the underlying value as a
property. And that's actually fine. If we can split up any stored values
the behavior needs from the storage of the property itself, that's
probably a good thing.

> Property Observers

Property Observers need to somehow support the behavior of letting
accessors reassign to the property without causing an infinite loop.
They also need to support subclassing such that the observers are called
in the correct order in the nested classes (and again, with
reassignment, such that the reassigned value is visible to the future
observers without starting the observer chain over again).

Property Observers also pose a special challenge for subclasses.
Overriding a property to add a behavior in many cases would actually
want to create brand new underlying storage (e.g. adding lazy to a
property needs different storage). But property observers explicitly
don't want to do that, they just want to observe the existing property.
I suspect this may actually line up quite well with the distinction
between decorators and other behaviors.

On a similar note, I'm not sure if there's any other behaviors where
overriding actually wants to preserve any existing behaviors. Property
observers definitely want to, but if I have a lazy property and I
override it in a subclass for any reason beyond adding observers, the
subclass property probably shouldn't be lazy. Conversely, if I have an
observed property and I override it to be lazy, it should still preserve
the property observers (but no other behaviors). This actually suggests
to me that Property Observers are unique among behaviors, and are
perhaps worthy of leaving as a language feature instead of as a
behavior. Of course, I can always override a property with a computed
property and call `super` in the getter/setter, at which point any
behaviors of the superclass property are expected to apply, but I don't
think there's any actual problems there.

Speaking of that, how do behaviors interact with computed properties? A
lazy computed property doesn't make sense (which is why the language
doesn't allow it). But an NSCopying computed property is fine (the
computed getter would be handed the copied value).

> The backing property has internal visibility by default

In most cases I'd recommend private by default. Just because I have an
internal property doesn't mean the underlying implementation detail
should be internal. In 100% of the cases where I've written a computed
property backed by a second stored property (typically named with a _
prefix), the stored property is always private, because nobody has any
business looking at it except for the class/struct it belongs to.

Although actually, having said that, there's at least one behavior
(resettable) that only makes sense if it's just as visible as the
property itself (e.g. so it should be public on a public property).

And come to think of it, just because the class designer didn't
anticipate a desire to access the underlying storage of a lazy property
(e.g. to check if it's been initialized yet) doesn't mean the user of
the property doesn't have a reason to get at that.

So I'm actually now leaning to making it default to the same
accessibility as the property itself (e.g. public, if the property
is public). Any behaviors that have internal implementation details
that should never be exposed (e.g. memoized should never expose its
box, but maybe it should expose an accessor to check if it's
initialized) can mark those properties/methods as internal or
private and that accessibility modifier would be obeyed. Which is to
say, the behavior itself should always be accessible on a property,
but implementation details of the behavior are subject to the normal
accessibility rules there.

The proposed (public lazy) syntax can still be used to lower visibility,
e.g. (private lazy).

> Defining behavior requirements using a protocol

As mentioned above, I think we should actually model behaviors using a
family of protocols. This will let us represent decorators vs value
transformers (and a behavior could even be both, by implementing both
protocols). We could also use protocols for eager initialization vs lazy
initialization (which is distinguished only by the presence of the
initializer closure in the behavior initializer). We'd need to do
something like

protocol Behavior { init(...) } protocol LazyBehavior { init(...) }
protocol DecoratorBehavior : Behavior { ... } protocol
LazyDecoratorBehavior : LazyBehavior { ... } protocol
TransformerBehavior : Behavior { ... } protocol LazyTransformerBehavior
: LazyBehavior { ... }

and that way a type could conform to both DecoratorBehavior and
TransformerBehavior without any collision in init (because the init
requirement comes from a shared base protocol).

As for actually defining the behavior name, you still do need the global
function, but it could maybe return the behavior type, e.g. behavior
functions are functions that match either of the following:

func name<T: Behavior>(...) -> T.Type func name<T:
LazyBehavior>(...) -> T.Type

I'm not really a big fan of having two "root" protocols here, but I also
don't like magical arguments (e.g. treating the presence of an argument
named "initializer" as meaningful) which is why the protocols take
initializers. I guess the protocols also need to declare typealiases for
the Value type (and TransformerBehavior can declare a separate typealias
for the TransformedValue, i.e. the underlying storage. e.g T? for lazy)

> A behavior declaration

This has promise as well. By using a declaration like this, you can have
basically a DSL (using contextual keywords) to specify things like
whether it's lazy-initialized, decorators, and transformers. Same
benefits as the protocol family (e.g. good compiler checking of the
behavior definition before it's even used anywhere), allows for code code-
completion too, and it doesn't litter the global function namespace with
behavior names.

The more I think about this, the more I think it's a good idea.
Especially because it won't litter the global function namespace with
behavior names. Behavior constructors should not be callable by the
user, and behaviors may be named things we would love to use as function
names anyway (if a behavior implements some functionality that is useful
to be exposed to the user anyway, it can vend a type like your proposal
has and people can just instantiate that type directly).

> Can properties with behaviors be initialized from init rather than
> with inline initializers?

I think the answer to this has to be "absolutely". Especially if
property observers are a behavior (as the initial value may need to be
computed from init args or other properties, which can't be done as an
inline initializer).

-Kevin Ballard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20151217/856c4d65/attachment.html>