com.google.common.hash
Interface HashFunction


@Beta
public interface HashFunction

A hash function is a collision-averse pure function that maps an arbitrary block of data to a number called a hash code.

Definition

Unpacking this definition:

Summarizing the last two points: "equal yield equal always; unequal yield unequal often." This is the most important characteristic of all hash functions.

Desirable properties

A high-quality hash function strives for some subset of the following virtues:

Providing input to a hash function

The primary way to provide the data that your hash function should act on is via a Hasher. Obtain a new hasher from the hash function using newHasher(), "push" the relevant data into it using methods like Hasher.putBytes(byte[]), and finally ask for the HashCode when finished using Hasher.hash(). (See an example of this.)

If all you want to hash is a single byte array, string or long value, there are convenient shortcut methods defined directly on HashFunction to make this easier.

Hasher accepts primitive data types, but can also accept any Object of type T provided that you implement a Funnel to specify how to "feed" data from that object into the function. (See an example of this.)

Compatibility note: Throughout this API, multibyte values are always interpreted in little-endian order. That is, hashing the byte array {0x01, 0x02, 0x03, 0x04} is equivalent to hashing the int value 0x04030201. If this isn't what you need, methods such as Integer.reverseBytes(int) and Ints.toByteArray(int) will help.

Relationship to Object.hashCode()

Java's baked-in concept of hash codes is constrained to 32 bits, and provides no separation between hash algorithms and the data they act on, so alternate hash algorithms can't be easily substituted. Also, implementations of hashCode tend to be poor-quality, in part because they end up depending on other existing poor-quality hashCode implementations, including those in many JDK classes.

Object.hashCode implementations tend to be very fast, but have weak collision prevention and no expectation of bit dispersion. This leaves them perfectly suitable for use in hash tables, because extra collisions cause only a slight performance hit, while poor bit dispersion is easily corrected using a secondary hash function (which all reasonable hash table implementations in Java use). For the many uses of hash functions beyond data structures, however, Object.hashCode almost always falls short -- hence this library.

Since:
11.0
Author:
Kevin Bourrillion

Method Summary
 int bits()
          Returns the number of bits (a multiple of 32) that each hash code produced by this hash function has.
 HashCode hashBytes(byte[] input)
          Shortcut for newHasher().putBytes(input).hash().
 HashCode hashBytes(byte[] input, int off, int len)
          Shortcut for newHasher().putBytes(input, off, len).hash().
 HashCode hashLong(long input)
          Shortcut for newHasher().putLong(input).hash(); returns the hash code for the given long value, interpreted in little-endian byte order.
 HashCode hashString(CharSequence input)
          Shortcut for newHasher().putString(input).hash().
 HashCode hashString(CharSequence input, Charset charset)
          Shortcut for newHasher().putString(input, charset).hash().
 Hasher newHasher()
          Begins a new hash code computation by returning an initialized, stateful Hasher instance that is ready to receive data.
 Hasher newHasher(int expectedInputSize)
          Begins a new hash code computation as newHasher(), but provides a hint of the expected size of the input (in bytes).
 

Method Detail

newHasher

Hasher newHasher()
Begins a new hash code computation by returning an initialized, stateful Hasher instance that is ready to receive data. Example:
   HashFunction hf = Hashing.md5();
   HashCode hc = hf.newHasher()
       .putLong(id)
       .putString(name)
       .hash();


newHasher

Hasher newHasher(int expectedInputSize)
Begins a new hash code computation as newHasher(), but provides a hint of the expected size of the input (in bytes). This is only important for non-streaming hash functions (hash functions that need to buffer their whole input before processing any of it).


hashLong

HashCode hashLong(long input)
Shortcut for newHasher().putLong(input).hash(); returns the hash code for the given long value, interpreted in little-endian byte order. The implementation might perform better than its longhand equivalent, but should not perform worse.


hashBytes

HashCode hashBytes(byte[] input)
Shortcut for newHasher().putBytes(input).hash(). The implementation might perform better than its longhand equivalent, but should not perform worse.


hashBytes

HashCode hashBytes(byte[] input,
                   int off,
                   int len)
Shortcut for newHasher().putBytes(input, off, len).hash(). The implementation might perform better than its longhand equivalent, but should not perform worse.

Throws:
IndexOutOfBoundsException - if off < 0 or off + len > bytes.length or len < 0

hashString

HashCode hashString(CharSequence input)
Shortcut for newHasher().putString(input).hash(). The implementation might perform better than its longhand equivalent, but should not perform worse. Note that no character encoding is performed; the low byte and high byte of each character are hashed directly (in that order). This is equivalent to using hashString(input, Charsets.UTF_16LE).


hashString

HashCode hashString(CharSequence input,
                    Charset charset)
Shortcut for newHasher().putString(input, charset).hash(). Characters are encoded using the given Charset. The implementation might perform better than its longhand equivalent, but should not perform worse.


bits

int bits()
Returns the number of bits (a multiple of 32) that each hash code produced by this hash function has.



Copyright © 2010-2012. All Rights Reserved.