[swift-evolution] [Proposal] Random Unification

Wed Sep 27 00:18:53 CDT 2017

> Le 26 sept. 2017 à 16:14, Xiaodi Wu <xiaodi.wu at gmail.com> a écrit :
> 
> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <felixcloutier at icloud.com <mailto:felixcloutier at icloud.com>> wrote:
> 
> It's possible to use a CSPRNG-grade algorithm and seed it once to get a reproducible sequence, but when you use it as a CSPRNG, you typically feed entropy back into it at nondeterministic points to ensure that even if you started with a bad seed, you'll eventually get to an alright state. Unless you keep track of when entropy was mixed in and what the values were, you'll never get a reproducible CSPRNG.
> 
> We would give developers a false sense of security if we provided them with CSPRNG-grade algorithms that we called CSPRNGs and that they could seed themselves. Just because it says "crypto-secure" in the name doesn't mean that it'll be crypto-secure if it's seeded with time(). Therefore, "reproducible" vs "non-reproducible" looks like a good distinction to me.
> 
> I disagree here, in two respects:
> 
> First, whether or not a particular PRNG is cryptographically secure is an intrinsic property of the algorithm; whether it's "reproducible" or not is determined by the published API. In other words, the distinction between CSPRNG vs. non-CSPRNG is important to document because it's semantics that cannot be deduced by the user otherwise, and it is an important one for writing secure code because it tells you whether an attacker can predict future outputs based only on observing past outputs. "Reproducible" in the sense of seedable or not is trivially noted by inspection of the published API, and it is rather immaterial to writing secure code.

Cryptographically secure is not a property that I'm comfortable applying to an algorithm. You cannot say that you've made a cryptographically secure thing just because you've used all the right algorithms: you also have to use them right, and one of the most critical components of a cryptographically secure PRNG is its seed. It is a *feature* of a lot of modern CSPRNGs that you can't seed them:

You cannot seed or add entropy to std::random_device
You cannot seed or add entropy to CryptGenRandom
You can only add entropy to /dev/(u)random
You can only add entropy to BSD's arc4random

Just because we can expose a seed interface doesn't mean we should, and in this case I believe that it would go against the prime objective of providing secure random numbers.

> If your attacker can observe your seeding once, chances are that they can observe your reseeding too; then, they can use their own implementation of the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom sequence whether or not Swift exposes any particular API.

On Linux, the random devices are initially seeded with machine-specific but rather invariant data that makes /dev/urandom spit out predictable numbers. It is considered "seeded" after a root process writes POOL_SIZE bytes to it. On most implementations, this initial seed is stored on disk: when the computer shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves it in a file, and the contents of that file is loaded back into /dev/urandom when the computer starts. A scenario where someone can read that file is certainly not less likely than a scenario where /dev/urandom was deleted. That doesn't mean that they have kernel code execution or that they can pry into your process, but they have a good shot at guessing your seed and subsequent RNG results if no stirring happens.

> Secondly, I see no reason to justify the notion that, simply because a PRNG is cryptographically secure, we ought to hide the seeding initializer (because one has to exist internally anyway) from the public. Obviously, one use case for a deterministic PRNG is to get reproducible sequences of random-appearing values; this can be useful whether the underlying algorithm is cryptographically secure or not. There are innumerably many ways to use data generated from a CSPRNG in non-cryptographically secure ways and omitting or including a public seeding initializer does not change that; in other words, using a deterministic seed for a CSPRNG would be a bad idea in certain applications, but it's a deliberate act, and someone who would mistakenly do that is clearly incapable of *using* the output from the PRNG in a secure way either; put a third way, you would be hard pressed to find a situation where it's true that "if only Swift had not made the seeding initializer public, this author would have written secure code, but instead the only security hole that existed in the code was caused by the availability of a public seeding initializer mistakenly used." The point of having both explicitly instantiable PRNGs and a layer of simpler APIs like "Int.random()" is so that the less experienced user can get the "right thing" by default, and the experienced user can customize the behavior; any user that instantiates his or her own ChaCha20Random instance is already calling for the power user interface; it is reasonable to expose the underlying primitive operations (such as seeding) so long as there are legitimate uses for it.

Nothing prevents us from using the same algorithm for a CSPRNG that is safely pre-seeded and a PRNG that people seed themselves, mind you. However, especially when it comes to security, there is a strong responsibility to drive developers into a pit of success: the most obvious thing to do has to be the right one, and suggesting to cryptographically-unaware developers that they have everything they need to manage their own seed is not a step in that direction.

I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling it cryptographically-secure, because it is not unless you know what to do with it. It is emphatically not far-fetched to imagine a developer who thinks that they can outdo the standard library by using their own ChaCha20Random instance after it's been seeded with time() if we let them know that it's "cryptographically secure". If you're a power user and you don't like the default, known-good CSPRNG, then you're hopefully good enough to know that ChaCha20 is considered a cryptographically-secure algorithm without help labels from the language, and you know how to operate it.

> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. /dev/urandom might never run out, but it is also possible for it not to be initialized at all, as in the case of some VM setups. In some older versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems where it is available, it can also be deleted, since it is a file. The point is, all of these scenarios cause an error during seeding of a CSPRNG. The question is, how to proceed in the face of inability to access entropy. We must do something, because we cannot therefore return a cryptographically secure answer. Rare trapping on invocation of Int.random() or permanently waiting for a never-to-be-initialized /dev/urandom would be terrible to debug, but returning an optional or throwing all the time would be verbose. How to design this API?

If the only concern is that the system might not be initialized enough, I'd say that whatever returns an instance of a global, framework-seeded CSPRNG should return an Optional, and the random methods that use the global CSPRNG can trap and scream that the system is not initialized enough. If this is a likely error for you, you can check if the CSPRNG exists or not before jumping.

Also note that there is only one system for which Swift is officially distributed (Ubuntu 14.04) on which the only way to get entropy from the OS is to open a random device and read from it.

>> * What should the default CSPRNG be? There are good arguments for using a cryptographically secure device random. (In my proposed implementation, for device random, I use Security.framework on Apple platforms (because /dev/urandom is not guaranteed to be available due to the sandbox, IIUC). On Linux platforms, I would prefer to use getrandom() and avoid using file system APIs, but getrandom() is new and unsupported on some versions of Ubuntu that Swift supports. This is an issue in and of itself.) Now, a number of these facilities strictly limit or do not guarantee availability of more than a small number of random bytes at a time; they are recommended for seeding other PRNGs but *not* as a routine source of random numbers. Therefore, although device random should be available to users, it probably shouldn’t be the default for the Swift standard library as it could have negative consequences for the system as a whole. There follows the significant task of implementing a CSPRNG correctly and securely for the default PRNG.
> 
> Theo give a talk a few years ago <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these problems are approached in LibreSSL.
> 
> Certainly, we can learn a lot from those like Theo who've dealt with the issue. I'm not in a position to watch the talk at the moment; can you summarize what the tl;dr version of it is?

I saw it three years ago, so I don't remember all the details. The gist is that:

OpenBSD's random is available from extremely early in the boot process with reasonable entropy
LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which doesn't actually use ARC4)
That implementation of arc4random is good because it is fool-proof and it has basically no failure mode
Stirring is good, having multiple components take random numbers from the same source probably makes results harder to guess too
Getrandom/getentropy is in all ways better than reading from random devices

Félix

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170926/cfff64fb/attachment.html>