[swift-evolution] [Draft] Hasher & HashVisitable

Brent Royal-Gordon brent at architechies.com
Wed Mar 15 07:04:32 CDT 2017


> On Mar 13, 2017, at 10:35 PM, Károly Lőrentey via swift-evolution <swift-evolution at swift.org> wrote:
> 
> One important point about SipHash and similar secure hash functions is that they work better if all nested components of a hashable value are (recursively) appended to a single hasher, rather than having each component create its own separate hasher instance in its hashValue implementation. Initializing and finalizing the hasher state is not free; it has some nontrivial cost. So ideally only collections would create and finalize hashers; composite types that implement Hashable should not do that. 

Are there any decent hash functions which *don't* have this property? Specifically, are there any which are transitive (but not commutative), so this property holds?

	hash(a) ++ (hash(b) ++ hash(c))  == (hash(a) ++ hash(b)) ++ hash(c)

(Where `++` is a "combine two hashes" function.)

What I sort of envision is a `Hash` type in the standard library, with a `Hashable` protocol which simply looks like:

	protocol Hashable: Equatable {
		var hashValue: Hash { get }
	}

A Hashable conformance would probably just look like:

	extension Person: Hashable {
		var hashValue: Hash {
			return Hash(firstName, lastName, birthDate)
		}
	}

Where all three types themselves were `Hashable` and returned `Hash`es of their own.

A transitive function would surely not be cryptographically strong—but we're not looking for cryptographic strength here, just good mixing and entropy preservation with some distinction between the left-hand and right-hand sides of the operation. I'm not about to do a literature search when I need to be awake in four hours, but I suspect somebody has tackled this problem before. If they have, we can get much better hashing without using generics or radically changing the way people hash—

> Therefore, implementing SipHash as a separate, optional facility for use inside individual hashValue implementations (as opposed to being an outright replacement for it) would be somewhat inefficient. It also wouldn't discourage/prevent people from continuing to use xor and similar ad hoc hashing methods; people would need to discover these useful new hash utilities on their own.

and a design where you returned a special Hash type, rather than a plain Int, would largely avoid this very real human factors concern.

-- 
Brent Royal-Gordon
Architechies



More information about the swift-evolution mailing list