[swift-evolution] [Draft] Hasher & HashVisitable
Brent Royal-Gordon
brent at architechies.com
Wed Mar 15 07:04:32 CDT 2017
> On Mar 13, 2017, at 10:35 PM, Károly Lőrentey via swift-evolution <swift-evolution at swift.org> wrote:
>
> One important point about SipHash and similar secure hash functions is that they work better if all nested components of a hashable value are (recursively) appended to a single hasher, rather than having each component create its own separate hasher instance in its hashValue implementation. Initializing and finalizing the hasher state is not free; it has some nontrivial cost. So ideally only collections would create and finalize hashers; composite types that implement Hashable should not do that.
Are there any decent hash functions which *don't* have this property? Specifically, are there any which are transitive (but not commutative), so this property holds?
hash(a) ++ (hash(b) ++ hash(c)) == (hash(a) ++ hash(b)) ++ hash(c)
(Where `++` is a "combine two hashes" function.)
What I sort of envision is a `Hash` type in the standard library, with a `Hashable` protocol which simply looks like:
protocol Hashable: Equatable {
var hashValue: Hash { get }
}
A Hashable conformance would probably just look like:
extension Person: Hashable {
var hashValue: Hash {
return Hash(firstName, lastName, birthDate)
}
}
Where all three types themselves were `Hashable` and returned `Hash`es of their own.
A transitive function would surely not be cryptographically strong—but we're not looking for cryptographic strength here, just good mixing and entropy preservation with some distinction between the left-hand and right-hand sides of the operation. I'm not about to do a literature search when I need to be awake in four hours, but I suspect somebody has tackled this problem before. If they have, we can get much better hashing without using generics or radically changing the way people hash—
> Therefore, implementing SipHash as a separate, optional facility for use inside individual hashValue implementations (as opposed to being an outright replacement for it) would be somewhat inefficient. It also wouldn't discourage/prevent people from continuing to use xor and similar ad hoc hashing methods; people would need to discover these useful new hash utilities on their own.
and a design where you returned a special Hash type, rather than a plain Int, would largely avoid this very real human factors concern.
--
Brent Royal-Gordon
Architechies
More information about the swift-evolution
mailing list