[swift-evolution] [Proposal] Random Unification

Wed Oct 4 04:41:30 CDT 2017

On Wed, Oct 4, 2017 at 02:39 Félix Cloutier <felixcloutier at icloud.com>
wrote:

> I'm really not enthusiastic about `random() -> Self?` or `random() throws
> -> Self` when the only possible error is that some global object hasn't
> been initialized.
>
> The idea of having `random` straight on integers and floats and
> collections was to provide a simple interface, but using a global CSPRNG
> for those operations comes at a significant usability cost. I think that
> something has to go:
>
>
>    1. Drop the random methods on FixedWidthInteger, FloatingPoint
>       - ...or drop the CSPRNG as a default
>    2. Drop the optional/throws, and trap on error
>
>
> I know I wouldn't use the `Int.random()` method if I had to unwrap every
> single result, when getting one non-nil result guarantees that the program
> won't see any other nil result again until it restarts.
>

>From the perspective of an app that can be suspended and resumed at any
time, “until it restarts” could be as soon as the next invocation of
`Int.random()`, could it not?

> Félix
>
> Le 3 oct. 2017 à 23:44, Jonathan Hull <jhull at gbis.com> a écrit :
>
> I like the idea of splitting it into 2 separate “Random” proposals.
>
> The first would have Xiaodi’s built-in CSPRNG which only has the interface:
>
> On FixedWidthInteger:
> static func random()throws -> Self
> static func random(in range: ClosedRange<Self>)throws -> Self
>
> On Double:
> static func random()throws -> Double
> static func random(in range: ClosedRange<Double>)throws -> Double
>
> (Everything else we want, like shuffled(), could be built in later
> proposals by calling those functions)
>
> The other option would be to remove the ‘throws’ from the above functions
> (perhaps fatalError-ing), and provide an additional function which can be
> used to check that there is enough entropy (so as to avoid the crash or
> fall back to a worse source when the CSPRNG is unavailable).
>
>
>
> Then a second proposal would bring in the concept of RandomSources
> (whatever we call them), which can return however many random bytes you ask
> for… and a protocol for types which know how to initialize themselves from
> those bytes.  That might be spelled like 'static func random(using:
> RandomSource)->Self'.  As a convenience, the source would also be able to
> create FixedWidthIntegers and Doubles (both with and without a range), and
> would also have the coinFlip() and oneIn(UInt)->Bool functions. Most types
> should be able to build themselves off of that.  There would be a default
> source which is built from the first protocol.
>
> I also really think we should have a concept of Repeatably-Random as a
> subprotocol for the second proposal.  I see far too many shipping apps
> which have bugs due to using arc4Random when they really needed a
> repeatable source (e.g. patterns and lines jump around when you resize
> things). If it was an easy option, people would use it when appropriate.
> This would just mean a sub-protocol which has an initializer which takes a
> seed, and the ability to save/restore state (similar to CGContexts).
>
> The second proposal would also include things like shuffled() and
> shuffled(using:).
>
> Thanks,
> Jon
>
>
>
> On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <aalonso128 at outlook.com>
> wrote:
>
> I really like the schedule here. After reading for a while, I do agree
> with Brent that stdlib should very primitive in functionality that it
> provides. I also agree that the most important part right now is designing
> the internal crypto on which the numeric types use to return their
> respected random number. On the discussion of how we should handle not
> enough entropy with the device random, from a users perspective it makes
> sense that calling .random should just give me a random number, but from a
> developers perspective I see Optional being the best choice here. While I
> think blocking could, in most cases, provide the user an easier API, we
> have to do this right and be safe here by providing a value that indicates
> that there is room for error here. As for the generator abstraction, I
> believe there should be a bare basic protocol that sets a layout for new
> generators and should be focusing on its requirements.
>
> Whether or not RandomAccessCollection and MutableCollection should get
> .random and .shuffle/.shuffled in this first proposal is completely up in
> the air for me. It makes sense, to me, to include the .random in this
> proposal and open another one .shuffle/.shuffled, but I can see arguments
> that should say we create something separate for these two, or include all
> of it in this proposal.
>
> - Alejandro
>
> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <xiaodi.wu at gmail.com>, wrote:
>
>
> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <felixcloutier at icloud.com>
> wrote:
>
>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <xiaodi.wu at gmail.com> a écrit :
>>
>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <
>> felixcloutier at icloud.com> wrote:
>>
>>>
>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a
>>> reproducible sequence, but when you use it as a CSPRNG, you typically feed
>>> entropy back into it at nondeterministic points to ensure that even if you
>>> started with a bad seed, you'll eventually get to an alright state. Unless
>>> you keep track of when entropy was mixed in and what the values were,
>>> you'll never get a reproducible CSPRNG.
>>>
>>> We would give developers a false sense of security if we provided them
>>> with CSPRNG-grade algorithms that we called CSPRNGs and that they could
>>> seed themselves. Just because it says "crypto-secure" in the name doesn't
>>> mean that it'll be crypto-secure if it's seeded with time(). Therefore,
>>> "reproducible" vs "non-reproducible" looks like a good distinction to me.
>>>
>>
>> I disagree here, in two respects:
>>
>> First, whether or not a particular PRNG is cryptographically secure is an
>> intrinsic property of the algorithm; whether it's "reproducible" or not is
>> determined by the published API. In other words, the distinction between
>> CSPRNG vs. non-CSPRNG is important to document because it's semantics that
>> cannot be deduced by the user otherwise, and it is an important one for
>> writing secure code because it tells you whether an attacker can predict
>> future outputs based only on observing past outputs. "Reproducible" in the
>> sense of seedable or not is trivially noted by inspection of the published
>> API, and it is rather immaterial to writing secure code.
>>
>>
>> Cryptographically secure is not a property that I'm comfortable applying
>> to an algorithm. You cannot say that you've made a cryptographically secure
>> thing just because you've used all the right algorithms: you also have to
>> use them right, and one of the most critical components of a
>> cryptographically secure PRNG is its seed.
>>
>
> A cryptographically secure algorithm isn’t sufficient, but it is
> necessary. That’s why it’s important to mark them as such. If I'm a careful
> developer, then it is absolutely important to me to know that I’m using a
> PRNG with a cryptographically secure algorithm, and that the particular
> implementation of that algorithm is correct and secure.
>
> It is a *feature* of a lot of modern CSPRNGs that you can't seed them:
>>
>>
>>    - You cannot seed or add entropy to std::random_device
>>
>>
> Although std::random_device may in practice be backed by a software
> CSPRNG, IIUC, the intention is that it can provide access to a hardware
> non-deterministic source when available.
>
>
>>    - You cannot seed or add entropy to CryptGenRandom
>>    - You can only add entropy to /dev/(u)random
>>    - You can only add entropy to BSD's arc4random
>>
>>
> Ah, I see. I think we mean different things when we say PRNG. A PRNG is an
> entirely deterministic algorithm; the output is non-random and the
> algorithm itself requires no entropy. If a PRNG is seeded with a random
> sequence of bits, its output can "appear" to be random. A CSPRNG is a PRNG
> that fulfills certain criteria such that its output can be appropriate for
> use in cryptographic applications in place of a truly random sequence *if*
> the input to the CSPRNG is itself random.
>
> The examples you give above *incorporate* a CSPRNG, environment entropy,
> and a set of rules about when to mix in additional entropy in order to
> produce output indistinguishable from a random sequence, but they are *not*
> themselves really *pseudorandom* generators because they are not
> deterministic. Not only do such sources of random numbers not require an
> interface to allow seeding, they do not even have to be publicly
> instantiable: Swift need only expose a single thread-safe instance (or an
> instance per thread) of a single type that provides access to
> CryptGenRandom/urandom/arc4random, since after all the output of multiple
> instances of that type should be statistically indistinguishable from the
> output of only one.
>
> What I was trying to respond to, by contrast, is the design of a hierarchy
> of protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource
> : RandomSource) and the appropriate APIs to expose on each. This is
> entirely inapplicable to your examples. It stands to reason that a
> non-instantiable source of random numbers does not require a protocol of
> its own (a hypothetical RNG : CSPRNG), since there is no reason to
> implement (if done correctly) more than a single publicly non-instantiable
> singleton type that could conform to it. For that matter, the concrete type
> itself probably doesn't need *any* public API at all. Instead, extensions
> to standard library types such as Int that implement conformance to the
> protocol that Alejandro names "Randomizable" could call internal APIs to
> provide all the necessary functionality, and third-party types that need to
> conform to "Randomizable" could then in turn use `Int.random()` or
> `Double.random()` to implement their own conformance. In fact, the concrete
> random number generator type doesn't need to be public at all. All public
> interaction could be through APIs such as `Int.random()`.
>
>
>> Just because we can expose a seed interface doesn't mean we should, and
>> in this case I believe that it would go against the prime objective of
>> providing secure random numbers.
>>
>>
> If we're talking about a Swift interface to a non-deterministic source of
> random numbers like urandom or arc4random, then, as I write above, not only
> do I agree that it doesn't need to be seedable, it also does not need to be
> instantiable at all, does not need to conform to a protocol that
> specifically requires the semantics of a non-deterministic source, does not
> need to expose any public interface whatsoever, and doesn't itself even
> need to be public. (Does it even need to be a type, as opposed to simply a
> free function?)
>
> In fact, having reasoned through all of this, we can split the design task
> into two. The most essential part, which definitely should be part of the
> stdlib, would be an internal interface to a cryptographically secure
> platform-specific entropy source, a public protocol named something like
> Randomizable (to be bikeshedded), and the appropriate implementations on
> Boolean, binary integer, and floating point types to conform them to
> Randomizable so that users can write `Bool.random()` or `Int.random()`. The
> second part, which can be a separate proposal or even a standalone core
> library or third-party library, would be the protocols and concrete types
> that implement pseudorandom number generators, allowing for reproducible
> pseudorandom sequences. In other words, instead of PRNGs and CSPRNGs being
> the primitives on which `Int.random()` is implemented; `Int.random()`
> should be the standard library primitive which allows PRNGs and CSPRNGs to
> be seeded.
>
>> If your attacker can observe your seeding once, chances are that they can
>> observe your reseeding too; then, they can use their own implementation of
>> the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom
>> sequence whether or not Swift exposes any particular API.
>>
>>
>> On Linux, the random devices are initially seeded with machine-specific
>> but rather invariant data that makes /dev/urandom spit out predictable
>> numbers. It is considered "seeded" after a root process writes POOL_SIZE
>> bytes to it. On most implementations, this initial seed is stored on disk:
>> when the computer shuts down, it reads POOL_SIZE bytes from /dev/urandom
>> and saves it in a file, and the contents of that file is loaded back into
>> /dev/urandom when the computer starts. A scenario where someone can read
>> that file is certainly not less likely than a scenario where /dev/urandom
>> was deleted. That doesn't mean that they have kernel code execution or that
>> they can pry into your process, but they have a good shot at guessing your
>> seed and subsequent RNG results if no stirring happens.
>>
>
> Sorry, I don't understand what you're getting at here. Again, I'm talking
> about deterministic algorithms, not non-deterministic sources of random
> numbers.
>
> Secondly, I see no reason to justify the notion that, simply because a
>> PRNG is cryptographically secure, we ought to hide the seeding initializer
>> (because one has to exist internally anyway) from the public. Obviously,
>> one use case for a deterministic PRNG is to get reproducible sequences of
>> random-appearing values; this can be useful whether the underlying
>> algorithm is cryptographically secure or not. There are innumerably many
>> ways to use data generated from a CSPRNG in non-cryptographically secure
>> ways and omitting or including a public seeding initializer does not change
>> that; in other words, using a deterministic seed for a CSPRNG would be a
>> bad idea in certain applications, but it's a deliberate act, and someone
>> who would mistakenly do that is clearly incapable of *using* the output
>> from the PRNG in a secure way either; put a third way, you would be hard
>> pressed to find a situation where it's true that "if only Swift had not
>> made the seeding initializer public, this author would have written secure
>> code, but instead the only security hole that existed in the code was
>> caused by the availability of a public seeding initializer mistakenly
>> used." The point of having both explicitly instantiable PRNGs and a layer
>> of simpler APIs like "Int.random()" is so that the less experienced user
>> can get the "right thing" by default, and the experienced user can
>> customize the behavior; any user that instantiates his or her own
>> ChaCha20Random instance is already calling for the power user interface; it
>> is reasonable to expose the underlying primitive operations (such as
>> seeding) so long as there are legitimate uses for it.
>>
>>>
>> Nothing prevents us from using the same algorithm for a CSPRNG that is
>> safely pre-seeded and a PRNG that people seed themselves, mind you.
>> However, especially when it comes to security, there is a strong
>> responsibility to drive developers into a pit of success: the most obvious
>> thing to do has to be the right one, and suggesting to
>> cryptographically-unaware developers that they have everything they need to
>> manage their own seed is not a step in that direction.
>>
>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly
>> calling it cryptographically-secure, because it is not unless you know what
>> to do with it. It is emphatically not far-fetched to imagine a developer
>> who thinks that they can outdo the standard library by using their own
>> ChaCha20Random instance after it's been seeded with time() if we let them
>> know that it's "cryptographically secure". If you're a power user and you
>> don't like the default, known-good CSPRNG, then you're hopefully good
>> enough to know that ChaCha20 is considered a cryptographically-secure
>> algorithm without help labels from the language, and you know how to
>> operate it.
>>
>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random.
>> /dev/urandom might never run out, but it is also possible for it not to be
>> initialized at all, as in the case of some VM setups. In some older
>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems
>> where it is available, it can also be deleted, since it is a file. The
>> point is, all of these scenarios cause an error during seeding of a CSPRNG.
>> The question is, how to proceed in the face of inability to access entropy.
>> We must do something, because we cannot therefore return a
>> cryptographically secure answer. Rare trapping on invocation of
>> Int.random() or permanently waiting for a never-to-be-initialized
>> /dev/urandom would be terrible to debug, but returning an optional or
>> throwing all the time would be verbose. How to design this API?
>>
>>
>> If the only concern is that the system might not be initialized enough,
>> I'd say that whatever returns an instance of a global, framework-seeded
>> CSPRNG should return an Optional, and the random methods that use the
>> global CSPRNG can trap and scream that the system is not initialized
>> enough. If this is a likely error for you, you can check if the CSPRNG
>> exists or not before jumping.
>>
>> Also note that there is only one system for which Swift is officially
>> distributed (Ubuntu 14.04) on which the only way to get entropy from the OS
>> is to open a random device and read from it.
>>
>
> Again, I'm not only talking about urandom. As far as I'm aware, every API
> to retrieve cryptographically secure sequences of random bits on every
> platform for which Swift is distributed can potentially return an error
> instead of random bits. The question is, what design for our API is the
> most sensible way to deal with this contingency? On rethinking, I do
> believe that consistently returning an Optional is the best way to go about
> it, allowing the user to either (a) supply a deterministic fallback; (b)
> raise an error of their own choosing; or (c) trap--all with a minimum of
> fuss. This seems very Swifty to me.
>
>
>>
>> * What should the default CSPRNG be? There are good arguments for using a
>>> cryptographically secure device random. (In my proposed implementation, for
>>> device random, I use Security.framework on Apple platforms (because
>>> /dev/urandom is not guaranteed to be available due to the sandbox, IIUC).
>>> On Linux platforms, I would prefer to use getrandom() and avoid using file
>>> system APIs, but getrandom() is new and unsupported on some versions of
>>> Ubuntu that Swift supports. This is an issue in and of itself.) Now, a
>>> number of these facilities strictly limit or do not guarantee availability
>>> of more than a small number of random bytes at a time; they are recommended
>>> for seeding other PRNGs but *not* as a routine source of random numbers.
>>> Therefore, although device random should be available to users, it probably
>>> shouldn’t be the default for the Swift standard library as it could have
>>> negative consequences for the system as a whole. There follows the
>>> significant task of implementing a CSPRNG correctly and securely for the
>>> default PRNG.
>>>
>>>
>>> Theo give a talk a few years ago
>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how
>>> these problems are approached in LibreSSL.
>>>
>>
>> Certainly, we can learn a lot from those like Theo who've dealt with the
>> issue. I'm not in a position to watch the talk at the moment; can you
>> summarize what the tl;dr version of it is?
>>
>>
>> I saw it three years ago, so I don't remember all the details. The gist
>> is that:
>>
>>
>>    - OpenBSD's random is available from extremely early in the boot
>>    process with reasonable entropy
>>
>>
>>    - LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG
>>    (which doesn't actually use ARC4)
>>    - That implementation of arc4random is good because it is fool-proof
>>    and it has basically no failure mode
>>    - Stirring is good, having multiple components take random numbers
>>    from the same source probably makes results harder to guess too
>>    - Getrandom/getentropy is in all ways better than reading from random
>>    devices
>>
>>
> Vigorously agree on all points. Thanks for the summary.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171004/6edd881d/attachment.html>