[swift-evolution] [Draft] Hasher & HashVisitable

Haravikk swift-evolution at haravikk.me
Mon Mar 13 12:25:19 CDT 2017


I really like the proposal!

Only thing I'm wondering is whether a convenience function on Hasher might be nice to have, like so:

	extension Hasher {
		mutating func hash<H:Hashable>(_ hashable:H) -> Int {
			hashable.hash(self)
			return self.finish()
		}
	}

This just feels like it would be easier for the most common case where you just need to hash a single value right away, as you'd just do:

	let hashValue = hasher.hash(foo)

While leaving the inout style available for when you do need to combine multiple values together or otherwise feed in data gradually. This way we're not losing as much of the convenience of the .hashValue that's being deprecated. On that note, should Hashable be updated to do something like the above as well, so it still immediately returns a value from a default hasher?

Anyway, definitely in favour of this change, as you're right; .hashValue is a hard thing to do right, and is often an after-thought, which can make it of limited usefulness to types that depend upon it!

> On 13 Mar 2017, at 15:38, Vincent Esche via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Hi there,
> 
> I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!
> 
> Rendered <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>
> 
> Cheers,
> Vincent
> 
> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. 👍
> 
> HashVisitable
> 
> Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
> Authors: Vincent Esche <https://github.com/regexident>
> Review Manager: TBD
> Status: Awaiting review
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>Introduction
> 
> Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.
> 
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>Motivation
> 
> Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.
> 
> The documentation of Hashable lists a sample implementation <https://developer.apple.com/reference/swift/hashable> of var hashValue:
> 
> /// A point in an x-y coordinate system.
> struct GridPoint {
>     var x: Int
>     var y: Int
> }
> 
> extension GridPoint: Hashable {
>     var hashValue: Int {
>         return x.hashValue ^ y.hashValue
>     }
> 
>     static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>         return lhs.x == rhs.x && lhs.y == rhs.y
>     }
> }
> Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …
> 
> let (width, height) = (1000, 1000)
> let total = width * height
> var hashes = Set<Int>()
> for x in 0..<width {
>     for y in 0..<height {
>         hashes.insert(GridPoint(x: x, y: y).hashValue)
>     }
> }
> print("\(hashes.count) unique hashes out of a total of \(total).")
> … results in just 1024 unique hash values for 1_000_000 unique values.
> 
> In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.
> 
> Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.
> 
> The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web <https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>.
> 
> If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?
> 
> In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!
> 
> Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.
> 
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed solution
> 
> Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.
> 
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed design
> 
> The proposal deprecates the Hashable protocol and introduces the following two:
> 
> protocol Hasher {
>     mutating func finish() -> Int
>     mutating func write(bytes: UnsafeRawBufferPointer)
> }
> 
> protocol HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H)
> }
> Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:
> 
> protocol ContiguouslyHashable: HashVisitable {}
> 
> extension ContiguouslyHashable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         var mutableSelf = self
>         try! Swift.withUnsafeBytes(of: &mutableSelf) {
>             hasher.write(bytes: $0)
>         }
>     }
> }
> 
> extension Bool : ContiguouslyHashable {}
> extension UInt8 : ContiguouslyHashable {}
> extension UInt16 : ContiguouslyHashable {}
> extension UInt32 : ContiguouslyHashable {}
> extension UInt64 : ContiguouslyHashable {}
> extension UInt : ContiguouslyHashable {}
> extension Int8 : ContiguouslyHashable {}
> extension Int16 : ContiguouslyHashable {}
> extension Int32 : ContiguouslyHashable {}
> extension Int64 : ContiguouslyHashable {}
> extension Int : ContiguouslyHashable {}
> The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>, and the reasonably secure SipHash-4-8 <https://en.wikipedia.org/wiki/SipHash>.
> 
> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:
> 
> struct Fnv1aHash {
>     fileprivate var state: UInt
>     init(seed: UInt) {
>         self.state = seed &+ 14695981039346656037
>     }
> }
> 
> extension Fnv1aHash: Hasher {
>     mutating func write(bytes: UnsafeRawBufferPointer) {
>         for byte in bytes {
>             self.state = (self.state ^ UInt(byte)) &* 1099511628211
>         }
>     }
>     mutating func finish() -> Int {
>         return unsafeBitCast(self.state, to: Int.self)
>     }
> }
> Coming back to the sample code present in the Hashable documentation, the new implementation would look like:
> 
> extension GridPoint: HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         self.x.hash(&hasher)
>         self.y.hash(&hasher)
>     }
> }
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source compatibility
> 
> Making use of "extending protocols to conform to protocols <https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>":
> 
> extension Hashable: HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         self.hashValue.hash(&hasher)
>     }
> }
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect on ABI stability
> 
> n/a
> 
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect on API resilience
> 
> This feature should be possible to add/remove without breaking ABI.
> 
>  <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives considered
> 
> n/a
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170313/452b9d68/attachment.html>


More information about the swift-evolution mailing list