[swift-evolution] [Draft] Hasher & HashVisitable

Vincent Esche regexident.mailinglists at gmail.com
Mon Mar 13 13:14:29 CDT 2017


That sure looks like a nice addition. 👍

I tried to keep the proposal as small as possible, as it already looks much
more complicated than it is.

On Mon, Mar 13, 2017 at 6:25 PM, Haravikk <swift-evolution at haravikk.me>
wrote:

> I really like the proposal!
>
> Only thing I'm wondering is whether a convenience function on Hasher might
> be nice to have, like so:
>
> extension Hasher {
> mutating func hash<H:Hashable>(_ hashable:H) -> Int {
> hashable.hash(self)
> return self.finish()
> }
> }
>
> This just feels like it would be easier for the most common case where you
> just need to hash a single value right away, as you'd just do:
>
> let hashValue = hasher.hash(foo)
>
> While leaving the inout style available for when you do need to combine
> multiple values together or otherwise feed in data gradually. This way
> we're not losing as much of the convenience of the .hashValue that's being
> deprecated. On that note, should Hashable be updated to do something like
> the above as well, so it still immediately returns a value from a default
> hasher?
>
> Anyway, definitely in favour of this change, as you're right; .hashValue
> is a hard thing to do right, and is often an after-thought, which can make
> it of limited usefulness to types that depend upon it!
>
> On 13 Mar 2017, at 15:38, Vincent Esche via swift-evolution <
> swift-evolution at swift.org> wrote:
>
> Hi there,
>
> I've written up a proposal for an overhaul/replacement of the Hashable
> protocol and would love to hear your feedback on it!
>
> Rendered
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog
> Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>
>
> Cheers,
> Vincent
>
> Ps: I'd like to thank David Hart (@hartbit) for his great editorial
> feedback on this proposal. 👍
>
> HashVisitable
>
>    - Proposal: SE-NNNN
>    <https://gist.github.com/regexident/NNNN-filename.md>
>    - Authors: Vincent Esche <https://github.com/regexident>
>    - Review Manager: TBD
>    - Status: Awaiting review
>
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>
> Introduction
>
> Replace the Hashable protocol by two new procotols (Hasher and
> HashVisitable) to improve safety, versatility and learnability.
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>
> Motivation
>
> Implementing Hashable is difficult and the consequences if not done well
> have dire performance and safety repercussions.
>
> The documentation of Hashable lists a sample implementation
> <https://developer.apple.com/reference/swift/hashable> of var hashValue:
>
> /// A point in an x-y coordinate system.
> struct GridPoint {
>     var x: Int
>     var y: Int
> }
>
> extension GridPoint: Hashable {
>     var hashValue: Int {
>         return x.hashValue ^ y.hashValue
>     }
>
>     static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>         return lhs.x == rhs.x && lhs.y == rhs.y
>     }
> }
>
> Calculating the hashes of all GridPoints (given the above implementation)
> on a 1000 × 1000 grid …
>
> let (width, height) = (1000, 1000)
> let total = width * height
> var hashes = Set<Int>()
> for x in 0..<width {
>     for y in 0..<height {
>         hashes.insert(GridPoint(x: x, y: y).hashValue)
>     }
> }
> print("\(hashes.count) unique hashes out of a total of \(total).")
>
> … results in just 1024 unique hash values for 1_000_000 unique values.
>
> In other words: The recommended implementation causes 99.9% of values to
> trigger a hash collision.
>
> Out of those 1_000_000 values the median collision count was 976 with min
> and max being 976 and 1000respectively.
>
> The collision rate will have negative impact in algorithms which heavily
> use hashValue like the ones in Dictionaryand Set. Furthermore, it
> increases the vulnerability to DDOS attacks when exposed to the web
> <https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>
> .
>
> If even the official Swift documentation gets the implementation of
> hashValue wrong, then who is to expect the average Swift programmer to do
> any better?
>
> In contrast, running the same snippet using HashVisitable and the
> semi-secure Fnv1aHash (see below) results in zero collisions!
>
> Finally, the design of the Hashable protocol forces the use of one
> implementation without the possibility of switching between multiple
> hashing algorithms.
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed
> solution
>
> Instead of coupling the hashing algorithm with each and every Swift type,
> we should provide a hashing API based on the visitor-pattern. By freeing
> application developers from the burden of having to implement hashing
> algorithms, the Standard Library can provide default ones which fulfill
> collision, performance and security goals. Furthermore, it would allow
> developers to swap to different algorithms based on the use case.
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed
> design
>
> The proposal deprecates the Hashable protocol and introduces the
> following two:
>
> protocol Hasher {
>     mutating func finish() -> Int
>     mutating func write(bytes: UnsafeRawBufferPointer)
> }
> protocol HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H)
> }
>
> Hasher is the protocol which represents a hashing algorithm, and
> HashVisitable replaces Hashable. For types entirely represented by their
> memory layout, the following protocol would provide a default
> implementation:
>
> protocol ContiguouslyHashable: HashVisitable {}
>
> extension ContiguouslyHashable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         var mutableSelf = self
>         try! Swift.withUnsafeBytes(of: &mutableSelf) {
>             hasher.write(bytes: $0)
>         }
>     }
> }
>
> extension Bool : ContiguouslyHashable {}
> extension UInt8 : ContiguouslyHashable {}
> extension UInt16 : ContiguouslyHashable {}
> extension UInt32 : ContiguouslyHashable {}
> extension UInt64 : ContiguouslyHashable {}
> extension UInt : ContiguouslyHashable {}
> extension Int8 : ContiguouslyHashable {}
> extension Int16 : ContiguouslyHashable {}
> extension Int32 : ContiguouslyHashable {}
> extension Int64 : ContiguouslyHashable {}
> extension Int : ContiguouslyHashable {}
>
> The Standard-Library would then provide a set of hashing implementations
> specific to each purpose. A possible choice for hashing algorithms would be
> the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>,
> and the reasonably secure SipHash-4-8
> <https://en.wikipedia.org/wiki/SipHash>.
>
> FNV-1A is another popular semi-secure but blazingly fast hash algorithm,
> which – for the sake of demonstration – could be implemented as follows:
>
> struct Fnv1aHash {
>     fileprivate var state: UInt
>     init(seed: UInt) {
>         self.state = seed &+ 14695981039346656037
>     }
> }
> extension Fnv1aHash: Hasher {
>     mutating func write(bytes: UnsafeRawBufferPointer) {
>         for byte in bytes {
>             self.state = (self.state ^ UInt(byte)) &* 1099511628211
>         }
>     }
>     mutating func finish() -> Int {
>         return unsafeBitCast(self.state, to: Int.self)
>     }
> }
>
> Coming back to the sample code present in the Hashable documentation, the
> new implementation would look like:
>
> extension GridPoint: HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         self.x.hash(&hasher)
>         self.y.hash(&hasher)
>     }
> }
>
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source
> compatibility
>
> Making use of "extending protocols to conform to protocols
> <https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>
> ":
>
> extension Hashable: HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         self.hashValue.hash(&hasher)
>     }
> }
>
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect
> on ABI stability
>
> n/a
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect
> on API resilience
>
> This feature should be possible to add/remove without breaking ABI.
>
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives
> considered
> n/a
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170313/3b61e582/attachment.html>


More information about the swift-evolution mailing list