[swift-evolution] [Draft] Hasher & HashVisitable

Vincent Esche regexident.mailinglists at gmail.com
Mon Mar 13 10:38:14 CDT 2017

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog
Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>


Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal. 👍


   - Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
   - Authors: Vincent Esche <https://github.com/regexident>
   - Review Manager: TBD
   - Status: Awaiting review


Replace the Hashable protocol by two new procotols (Hasher and HashVisitable)
to improve safety, versatility and learnability.

Implementing Hashable is difficult and the consequences if not done well
have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation
<https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y

Calculating the hashes of all GridPoints (given the above implementation)
on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to
trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min
and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily
use hashValue like the ones in Dictionaryand Set. Furthermore, it increases
the vulnerability to DDOS attacks when exposed to the web

If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?

In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.

Instead of coupling the hashing algorithm with each and every Swift type,
we should provide a hashing API based on the visitor-pattern. By freeing
application developers from the burden of having to implement hashing
algorithms, the Standard Library can provide default ones which fulfill
collision, performance and security goals. Furthermore, it would allow
developers to swap to different algorithms based on the use case.

The proposal deprecates the Hashable protocol and introduces the following

protocol Hasher {
    mutating func finish() -> Int
    mutating func write(bytes: UnsafeRawBufferPointer)
protocol HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H)

Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations
specific to each purpose. A possible choice for hashing algorithms would be
the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>,
and the reasonably secure SipHash-4-8

FNV-1A is another popular semi-secure but blazingly fast hash algorithm,
which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
extension Fnv1aHash: Hasher {
    mutating func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
    mutating func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)

Coming back to the sample code present in the Hashable documentation, the
new implementation would look like:

extension GridPoint: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {


Making use of "extending protocols to conform to protocols

extension Hashable: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {

on ABI stability

on API resilience

This feature should be possible to add/remove without breaking ABI.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170313/36d084df/attachment.html>

More information about the swift-evolution mailing list