[swift-evolution] [Proposal] Random Unification

Xiaodi Wu xiaodi.wu at gmail.com
Fri Nov 24 23:39:55 CST 2017


On Fri, Nov 24, 2017 at 10:59 PM, TellowKrinkle <tellowkrinkle at gmail.com>
wrote:

> So why is it more important for the random method on a collection to have
> a special method that guarantees a discrete uniform distribution than it is
> for an Int?  If you’re going to split on guaranteed-discrete-uniform vs
> maybe-discrete-uniform, why not split on discrete-uniform vs
> not-discrete-uniform (note: I would not want either of these)?
>
> Why not just let everything be maybe-discrete-uniform and then specify:
> - Things involving discrete sets (including collections and ranges of
> discrete values like ints) return a discrete uniform distribution
> - Things involving continuous ranges (including ranges of floating-point
> types) return a continuous uniform distribution
> I don’t really see the point in differentiating between a discrete and
> continuous distribution, since it makes no sense to use a continuous
> distribution for things that are discrete, and it also makes no sense to
> use a discrete distribution for things that are continuous.
>

One of the arguments that others have raised against the proposed
`Randomizable` protocol and the static method `random` is precisely this:
that static `random` guarantees no semantics about the nature of the random
value, including the distribution from which it is drawn. I agree with this
criticism; so you are correct: I do not want a "maybe-discrete-uniform"
method at all, let alone one that shares its name with methods that do
guarantee a particular distribution.

As for optional vs non-optional, I’d say this is similar to conforming to
> RawRepresentable (where you can implement its `init?(rawValue:)` with an
> `init(rawValue:)` if your type doesn’t ever fail to initialize) where
> you’re simply indicating that for whatever reason, your type is less likely
> to fail than whatever the most likely to fail type is.
>
> Personally, I don’t care whether or not `Int.random` stays, but it’s
> functionally identical to `Int.random(in:)` with a default argument so it
> doesn’t make much of a difference for this decision since removing it
> wouldn’t affect the issue you’re having between `Int.random(in:)` and
> `Collection.random`.
>

There is certainly a difference. `Int.random(in:)` is failable, like
`Collection.random` is failable, because it is selecting one of a set of
values and that set may be empty. `Int.random` is not failable. Moreover,
as I wrote earlier, I'm concerned about this multiplication of methods that
do in fact do the same thing. It is unclear to me in what way
`Int.random(in: [1, 2, 3])` differs from `[1, 2, 3].random`. If there is no
semantic distinction, there should only be one facility. If there is a
semantic distinction, then there should be two facilities with distinct
names. In either case, there should not be two facilities with the same
name.

2017/11/24 21:39、Xiaodi Wu <xiaodi.wu at gmail.com>のメール:
>
> On Fri, Nov 24, 2017 at 9:05 PM, TellowKrinkle <tellowkrinkle at gmail.com>
> wrote:
>
>> You say that all the `.random`s have different semantics, but to me (at
>> least), they are all very similar.
>>
>
> Of course they are _similar_: this is precisely why it's so important to
> be clear about the differences in the naming.
>
>
>> All the methods can be summarized as selecting a single random element
>> from a collection
>> `[0, 2, 3].random` selects a single element from the given collection
>> `Int.random(in: 0…8)` selects a single element from the given range
>> `Int.random` has no range, but selects a single element from the
>> collection of all ints (equivalent to if the above method had a default
>> value for its range)
>> So to me these are all doing the same operation, just with different
>> types of inputs
>>
>
> There are many subtle but important differences. For example:
>
> `[1, 2, 3].random` is a sampling operation based on a discrete uniform
> distribution. All operations that choose an element from a Collection would
> behave similarly: that is, instance `random` guarantees sampling based on a
> discrete uniform distribution. It does so happen that `Int.random` gives
> values in a discrete uniform distribution. However, `Float.random` most
> certainly does not: it would sample from a _continuous_ uniform
> distribution. In general, static `random` does not guarantee any particular
> distribution at all. This is a huge semantic distinction.
>
> Static `random` (e.g., `Int.random`) will always return a value, whereas
> instance `random` (e.g., `[1, 2, 3].random`) might not. This is because all
> types that implement static `random` must be instantiable, whereas
> collections can be empty. One might conclude that it makes sense for static
> `random` to be of type `T`, whereas instance `random` would be most
> fittingly of type `T?`. However, because they're both named "random",
> people have been misled into thinking that they're in fact the same
> operation and must therefore have the same return type. Alejandro has
> argued that `[1, 2, 3].random` should be of type `T` *because* it would not
> be ergonomic for `Int.random` to be of type `T?`. Meanwhile, others have
> argued that, because `[].random` should be failable, `Int.random` should be
> as well. This perceived need for the two distinct facilities to return the
> same type is completely due to them having the same proposed name. However,
> as described above, one is failable and the other is not *because of their
> differing semantics*.
>
> Meanwhile, we have had a debate as to whether `random` should be spelled
> as a property or a function. Alejandro has argued that `random` is like
> `first` or `last` and is a property of a collection, while others have
> argued that `Int.random()` should be spelled like a function because it
> instantiates a different value each time. Notionally, of course, instance
> `random` selects one already-existing element from a collection, whereas
> static `random` creates a new value that doesn't exist yet and truly could
> be considered like a factory method. However, because again they've both
> been proposed to have the name "random", people are using arguments about
> one type of "random" to decide questions of syntax for the other type of
> "random".
>
> All of this goes away when we clarify that these two are distinct
> facilities: they have different semantics. Of course, elsewhere, I've
> advocated for `Int.random` to be removed altogether due to large potential
> for incorrect use. If so, then that's one fewer "random" to be confused
> with one another.
>
>
>> 2017/11/24 20:07、Alejandro Alonso <aalonso128 at outlook.com>のメール:
>>
>>
>> - Alejandro
>>
>> ---------- Forwarded message ----------
>> *From:* Xiaodi Wu <xiaodi.wu at gmail.com>
>> *Date:* Nov 24, 2017, 3:05 PM -0600
>> *To:* Alejandro Alonso <aalonso128 at outlook.com>
>> *Cc:* Brent Royal-Gordon <brent at architechies.com>, Steve Canon via
>> swift-evolution <swift-evolution at swift.org>
>> *Subject:* Re: [swift-evolution] [Proposal] Random Unification
>>
>> On Fri, Nov 24, 2017 at 2:55 PM, Alejandro Alonso <aalonso128 at outlook.com
>> > wrote:
>>
>>> Regarding naming too many things “random”, I’ve talked to many
>>> developers on my end and they all don’t find it confusing. This proposal is
>>> aimed to make it obvious what the operation is doing when regarding random.
>>> I still agree that the proposed solution does just that and in practice
>>> feels good to write.
>>>
>>
>> I must disagree quite strongly here. The various facilities you name
>> "random" have different semantics, and differences in semantics should be
>> reflected in differences in names. It doesn't matter that some people don't
>> find it confusing; it is objectively the case that you have named multiple
>> distinct facilities with the same name, which leads to confusion. I, for
>> one, get confused, and you can see on this list that people are using
>> arguments about one property named "random" to discuss another property
>> named "random". This is quite an intolerable situation.
>>
>> I disagree that sample is the correct naming to use here. Getting a
>>> sample is a verb in this context which would make it break API guidelines
>>> just as well as `pick()`. To sample is to “take a sample or samples of
>>> (something) for analysis.” I can agree to use `sampling()` which follows
>>> API guidelines. This would result in the following grammar for `[“hi”,
>>> “hello”, “hey”].sampling(2)`, “>From array, get a sampling of 2"
>>>
>>
>> "Sampling" is fine.
>>
>>
>> On Nov 23, 2017, 12:54 AM -0600, Xiaodi Wu , wrote:
>>>
>>> On Wed, Nov 22, 2017 at 23:01 Alejandro Alonso <aalonso128 at outlook.com>
>>> wrote:
>>>
>>>> Like I’ve said, python has different syntax grammar. We have to read
>>>> each call site and form a sentence from it. `random.choice([1, 2, 3])` to
>>>> me this reads, “Get a random choice from array”. This makes sense. Slapping
>>>> the word choice as an instance property like `[1, 2, 3].choice` reads,
>>>> “From array, get choice”. What is choice? This doesn’t make sense at all to
>>>> me. To me, the only good solution is `[1, 2, 3].random` which reads, “From
>>>> array, get random”. I actually think most users will be able to understand
>>>> this at first glance rather than choice (or any or some).
>>>>
>>>
>>> Again, my concern here is that you are proposing to name multiple things
>>> "random". If this property should be called "random"--which I'm fine
>>> with--then the static method "random(in:)" should be named something else,
>>> and the static property "random" should be dropped altogether (as I
>>> advocate for reasons we just discussed) or renamed as well. It is simply
>>> too confusing that there are so many different "random" methods or
>>> properties. Meanwhile, isn't your default RNG also going to be called
>>> something like "DefaultRandom"?
>>>
>>> In regards to the sample() function on collections, I have added this as
>>>> I do believe this is something users need. The name I gave it was pick() as
>>>> this reads, “From array, pick 2”.
>>>>
>>>
>>> The name "sample" has been used to good effect in other languages, has a
>>> well understood meaning in statistics, and is consistent with Swift
>>> language guidelines. The operation here is a sampling, and per Swift
>>> guidelines the name must be a noun: therefore, 'sample' is fitting. "Pick"
>>> does not intrinsically suggest randomness, whereas sample does, and your
>>> proposed reading uses it as a verb, whereas Swift guidelines tell us it
>>> must be a noun. I would advocate strongly for using well-established
>>> terminology and sticking with "sample."
>>>
>>>
>>> On Nov 17, 2017, 8:32 PM -0600, Xiaodi Wu via swift-evolution <
>>>> swift-evolution at swift.org>, wrote:
>>>>
>>>> On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon <
>>>> brent at architechies.com> wrote:
>>>>
>>>>> On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution <
>>>>> swift-evolution at swift.org> wrote:
>>>>>
>>>>> But actually, Int.random followed by % is the much bigger issue and a
>>>>> very good cautionary tale for why T.random is not a good idea. Swift should
>>>>> help users do the correct thing, and getting a random value across the full
>>>>> domain and computing an integer modulus is never the correct thing to do
>>>>> because of modulo bias, yet it's a very common error to make. We are much
>>>>> better off eliminating this API and encouraging use of the correct API,
>>>>> thereby reducing the likelihood of users making this category of error.
>>>>>
>>>>>
>>>>> Amen.
>>>>>
>>>>> If (and I agree with this) the range-based notation is less intuitive
>>>>> (0..<10.random is certainly less discoverable than Int.random), then we
>>>>> ought to offer an API in the form of `Int.random(in:)` but not
>>>>> `Int.random`. This does not preclude a `Collection.random` API as Alejandro
>>>>> proposes, of course, and that has independent value as Gwendal says.
>>>>>
>>>>>
>>>>> If we're not happy with the range syntax, maybe we should put
>>>>> `random(in:)`-style methods on the RNG protocol as extension methods
>>>>> instead. Then there's a nice, uniform style:
>>>>>
>>>>> let diceRoll = rng.random(in: 1...6)
>>>>> let card = rng.random(in: deck)
>>>>> let isHeads = rng.random(in: [true, false])
>>>>> let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint
>>>>> overload
>>>>>
>>>>> The only issue is that this makes the default RNG's name really
>>>>> important. Something like:
>>>>>
>>>>> DefaultRandom.shared.random(in: 1...6)
>>>>>
>>>>> Will be a bit of a pain for users.
>>>>>
>>>>
>>>> I did in fact implement this style of RNG in NumericAnnex, but I'm not
>>>> satisfied with the design myself. Not only is it a bit of an ergonomic
>>>> thorn, there's also another drawback that actually has weighty implications:
>>>>
>>>> Users aren't conditioned to reuse RNG instances. Perhaps, it is because
>>>> it can "feel" wrong that multiple random instances should come from the
>>>> *same* RNG. Instead, it "feels" more right to initialize a new RNG for
>>>> every random number. After all, if one RNG is random, two must be randomer!
>>>> This error is seen with some frequency in other languages that adopt this
>>>> design, and they sometimes resort to educating users through documentation
>>>> that isn't consistently heeded.
>>>>
>>>> Of course, you and I both know that this is not ideal for performance.
>>>> Moreover, for a number of PRNG algorithms, the first few hundred or
>>>> thousand iterations can be more predictable than later iterations. (Some
>>>> algorithms discard the first n iterations, but whether that's adequate
>>>> depends on the quality of the seed, IIUC.) Both of these issues don't apply
>>>> specifically to a default RNG type that cannot be initialized and always
>>>> uses entropy from the global pool, but that's not enough to vindicate the
>>>> design, IMO. By emphasizing *which* RNG instance is being used for random
>>>> number generation, the design encourages non-reuse of non-default RNGs,
>>>> which is precisely where this common error matters for performance (and
>>>> maybe security).
>>>>
>>>> Maybe we call the default RNG instance `random`, and then give the
>>>>> `random(in:)` methods another name, like `choose(in:)`?
>>>>>
>>>>> let diceRoll = random.choose(in: 1...6)
>>>>> let card = random.choose(in: deck)
>>>>> let isHeads = random.choose(in: [true, false])
>>>>> let probability = random.choose(in: 0.0...1.0)
>>>>> let diceRoll = rng.choose(in: 1...6)
>>>>> let card = rng.choose(in: deck)
>>>>> let isHeads = rng.choose(in: [true, false])
>>>>> let probability = rng.choose(in: 0.0...1.0)
>>>>>
>>>>> This would allow us to keep the default RNG's type private and expose
>>>>> it only as an existential—which means more code will treat RNGs as black
>>>>> boxes, and people will extend the RNG protocol instead of the default RNG
>>>>> struct—while also putting our default random number generator under the
>>>>> name `random`, which is probably where people will look for such a thing.
>>>>>
>>>>
>>>> I've said this already in my feedback, but it can get lost in the long
>>>> chain of replies, so I'll repeat myself here because it's relevant to the
>>>> discussion. I think one of the major difficulties of discussing the
>>>> proposed design is that Alejandro has chosen to use a property called
>>>> "random" to name multiple distinct functions which have distinct names in
>>>> other languages. In fact, almost every method or function is being named
>>>> "random." We are tripping over ourselves and muddling our thinking (or at
>>>> least, I find myself doing so) because different things have the exact same
>>>> name, and if I'm having this trouble after deep study of the design, I
>>>> think it's a good sign that this is going to be greatly confusing to users
>>>> generally.
>>>>
>>>> First, there's Alejandro's _static random_, which he proposes to return
>>>> an instance of type T given a type T. In Python, this is named `randint(a,
>>>> b)` for integers, and `random` (between 0 and 1) or `uniform(a, b)` for
>>>> floating-type types. The distinct names reflect the fact that `randint` and
>>>> `uniform` are mathematically quite different (one samples a *discrete*
>>>> uniform distribution and the other a *continuous* uniform distribution),
>>>> and I'm not aware of non-numeric types offering a similar API in Python.
>>>> These distinct names accurately reflect critiques from others on this list
>>>> that the proposed protocol `Randomizable` lumps together types that don't
>>>> share any common semantics for their _static random_ method, and that the
>>>> protocol is of questionable utility because types in general do not share
>>>> sufficient semantics such that one can do interesting work in generic code
>>>> with such a protocol.
>>>>
>>>> Then there's Alejandro's _instance random_, which he proposes to return
>>>> an element of type T given a instance of a collection of type T. In Python,
>>>> this is named "choice(seq)" (for one element, or else throws an error) and
>>>> "sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to
>>>> draw an analogy between _instance random_ and other instance properties of
>>>> a Collection such as `first` and `last`. In fact, the behavior of Python's
>>>> "choice" (if modified to return an Optional) and "sample", as a pair, would
>>>> fit in very well next to Swift's existing pairs of `first` and `prefix(k)`
>>>> and `last` and `suffix(k)`. We could trivially Swiftify the names here; for
>>>> example:
>>>>
>>>> ```
>>>> [1, 2, 3].first
>>>> [1, 2, 3].any // or `choice`, or `some`, or...
>>>> [1, 2, 3].last
>>>>
>>>> [1, 2, 3].prefix(2)
>>>> [1, 2, 3].sample(2)
>>>> [1, 2, 3].suffix(2)
>>>> ```
>>>>
>>>> I'm going to advocate again for _not_ naming all of these distinct
>>>> things "random". Even in conducting this discussion, it's so hard to keep
>>>> track of what particular function a person is giving feedback about.
>>>>
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>>
>>> On Nov 17, 2017, 8:32 PM -0600, Xiaodi Wu via swift-evolution <
>>> swift-evolution at swift.org>, wrote:
>>>
>>> On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon <
>>> brent at architechies.com> wrote:
>>>
>>>> On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution <
>>>> swift-evolution at swift.org> wrote:
>>>>
>>>> But actually, Int.random followed by % is the much bigger issue and a
>>>> very good cautionary tale for why T.random is not a good idea. Swift should
>>>> help users do the correct thing, and getting a random value across the full
>>>> domain and computing an integer modulus is never the correct thing to do
>>>> because of modulo bias, yet it's a very common error to make. We are much
>>>> better off eliminating this API and encouraging use of the correct API,
>>>> thereby reducing the likelihood of users making this category of error.
>>>>
>>>>
>>>> Amen.
>>>>
>>>> If (and I agree with this) the range-based notation is less intuitive
>>>> (0..<10.random is certainly less discoverable than Int.random), then we
>>>> ought to offer an API in the form of `Int.random(in:)` but not
>>>> `Int.random`. This does not preclude a `Collection.random` API as Alejandro
>>>> proposes, of course, and that has independent value as Gwendal says.
>>>>
>>>>
>>>> If we're not happy with the range syntax, maybe we should put
>>>> `random(in:)`-style methods on the RNG protocol as extension methods
>>>> instead. Then there's a nice, uniform style:
>>>>
>>>> let diceRoll = rng.random(in: 1...6)
>>>> let card = rng.random(in: deck)
>>>> let isHeads = rng.random(in: [true, false])
>>>> let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint
>>>> overload
>>>>
>>>> The only issue is that this makes the default RNG's name really
>>>> important. Something like:
>>>>
>>>> DefaultRandom.shared.random(in: 1...6)
>>>>
>>>> Will be a bit of a pain for users.
>>>>
>>>
>>> I did in fact implement this style of RNG in NumericAnnex, but I'm not
>>> satisfied with the design myself. Not only is it a bit of an ergonomic
>>> thorn, there's also another drawback that actually has weighty implications:
>>>
>>> Users aren't conditioned to reuse RNG instances. Perhaps, it is because
>>> it can "feel" wrong that multiple random instances should come from the
>>> *same* RNG. Instead, it "feels" more right to initialize a new RNG for
>>> every random number. After all, if one RNG is random, two must be randomer!
>>> This error is seen with some frequency in other languages that adopt this
>>> design, and they sometimes resort to educating users through documentation
>>> that isn't consistently heeded.
>>>
>>> Of course, you and I both know that this is not ideal for performance.
>>> Moreover, for a number of PRNG algorithms, the first few hundred or
>>> thousand iterations can be more predictable than later iterations. (Some
>>> algorithms discard the first n iterations, but whether that's adequate
>>> depends on the quality of the seed, IIUC.) Both of these issues don't apply
>>> specifically to a default RNG type that cannot be initialized and always
>>> uses entropy from the global pool, but that's not enough to vindicate the
>>> design, IMO. By emphasizing *which* RNG instance is being used for random
>>> number generation, the design encourages non-reuse of non-default RNGs,
>>> which is precisely where this common error matters for performance (and
>>> maybe security).
>>>
>>> Maybe we call the default RNG instance `random`, and then give the
>>>> `random(in:)` methods another name, like `choose(in:)`?
>>>>
>>>> let diceRoll = random.choose(in: 1...6)
>>>> let card = random.choose(in: deck)
>>>> let isHeads = random.choose(in: [true, false])
>>>> let probability = random.choose(in: 0.0...1.0)
>>>> let diceRoll = rng.choose(in: 1...6)
>>>> let card = rng.choose(in: deck)
>>>> let isHeads = rng.choose(in: [true, false])
>>>> let probability = rng.choose(in: 0.0...1.0)
>>>>
>>>> This would allow us to keep the default RNG's type private and expose
>>>> it only as an existential—which means more code will treat RNGs as black
>>>> boxes, and people will extend the RNG protocol instead of the default RNG
>>>> struct—while also putting our default random number generator under the
>>>> name `random`, which is probably where people will look for such a thing.
>>>>
>>>
>>> I've said this already in my feedback, but it can get lost in the long
>>> chain of replies, so I'll repeat myself here because it's relevant to the
>>> discussion. I think one of the major difficulties of discussing the
>>> proposed design is that Alejandro has chosen to use a property called
>>> "random" to name multiple distinct functions which have distinct names in
>>> other languages. In fact, almost every method or function is being named
>>> "random." We are tripping over ourselves and muddling our thinking (or at
>>> least, I find myself doing so) because different things have the exact same
>>> name, and if I'm having this trouble after deep study of the design, I
>>> think it's a good sign that this is going to be greatly confusing to users
>>> generally.
>>>
>>> First, there's Alejandro's _static random_, which he proposes to return
>>> an instance of type T given a type T. In Python, this is named `randint(a,
>>> b)` for integers, and `random` (between 0 and 1) or `uniform(a, b)` for
>>> floating-type types. The distinct names reflect the fact that `randint` and
>>> `uniform` are mathematically quite different (one samples a *discrete*
>>> uniform distribution and the other a *continuous* uniform distribution),
>>> and I'm not aware of non-numeric types offering a similar API in Python.
>>> These distinct names accurately reflect critiques from others on this list
>>> that the proposed protocol `Randomizable` lumps together types that don't
>>> share any common semantics for their _static random_ method, and that the
>>> protocol is of questionable utility because types in general do not share
>>> sufficient semantics such that one can do interesting work in generic code
>>> with such a protocol.
>>>
>>> Then there's Alejandro's _instance random_, which he proposes to return
>>> an element of type T given a instance of a collection of type T. In Python,
>>> this is named "choice(seq)" (for one element, or else throws an error) and
>>> "sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to
>>> draw an analogy between _instance random_ and other instance properties of
>>> a Collection such as `first` and `last`. In fact, the behavior of Python's
>>> "choice" (if modified to return an Optional) and "sample", as a pair, would
>>> fit in very well next to Swift's existing pairs of `first` and `prefix(k)`
>>> and `last` and `suffix(k)`. We could trivially Swiftify the names here; for
>>> example:
>>>
>>> ```
>>> [1, 2, 3].first
>>> [1, 2, 3].any // or `choice`, or `some`, or...
>>> [1, 2, 3].last
>>>
>>> [1, 2, 3].prefix(2)
>>> [1, 2, 3].sample(2)
>>> [1, 2, 3].suffix(2)
>>> ```
>>>
>>> I'm going to advocate again for _not_ naming all of these distinct
>>> things "random". Even in conducting this discussion, it's so hard to keep
>>> track of what particular function a person is giving feedback about.
>>>
>>>
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171124/9621fdae/attachment.html>


More information about the swift-evolution mailing list