[swift-evolution] [Draft] Hasher & HashVisitable

Vincent Esche regexident.mailinglists at gmail.com
Mon Mar 13 10:38:14 CDT 2017


Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!

Rendered
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog
Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal. 👍

HashVisitable

   - Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
   - Authors: Vincent Esche <https://github.com/regexident>
   - Review Manager: TBD
   - Status: Awaiting review

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable)
to improve safety, versatility and learnability.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>
Motivation

Implementing Hashable is difficult and the consequences if not done well
have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation
<https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}

Calculating the hashes of all GridPoints (given the above implementation)
on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to
trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min
and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily
use hashValue like the ones in Dictionaryand Set. Furthermore, it increases
the vulnerability to DDOS attacks when exposed to the web
<https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>
.

If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?

In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed
solution

Instead of coupling the hashing algorithm with each and every Swift type,
we should provide a hashing API based on the visitor-pattern. By freeing
application developers from the burden of having to implement hashing
algorithms, the Standard Library can provide default ones which fulfill
collision, performance and security goals. Furthermore, it would allow
developers to swap to different algorithms based on the use case.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed
design

The proposal deprecates the Hashable protocol and introduces the following
two:

protocol Hasher {
    mutating func finish() -> Int
    mutating func write(bytes: UnsafeRawBufferPointer)
}
protocol HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H)
}

Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations
specific to each purpose. A possible choice for hashing algorithms would be
the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>,
and the reasonably secure SipHash-4-8
<https://en.wikipedia.org/wiki/SipHash>.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm,
which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
    }
}
extension Fnv1aHash: Hasher {
    mutating func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
        }
    }
    mutating func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)
    }
}

Coming back to the sample code present in the Hashable documentation, the
new implementation would look like:

extension GridPoint: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.x.hash(&hasher)
        self.y.hash(&hasher)
    }
}

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source
compatibility

Making use of "extending protocols to conform to protocols
<https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>
":

extension Hashable: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.hashValue.hash(&hasher)
    }
}

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect
on ABI stability

n/a
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect
on API resilience

This feature should be possible to add/remove without breaking ABI.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives
considered

n/a
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170313/36d084df/attachment.html>


More information about the swift-evolution mailing list