|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
@Beta public interface HashFunction
A hash function is a collision-averse pure function that maps an arbitrary block of data to a number called a hash code.
Unpacking this definition:
Hasher
), but this is merely a convenience; these are
always translated into raw byte sequences under the covers.
bits()
). For example, Hashing.sha1()
produces a
160-bit number, while Hashing.murmur3_32()
yields only 32 bits. Because a
long
value is clearly insufficient to hold all hash code values, this API
represents a hash code as an instance of HashCode
.
Summarizing the last two points: "equal yield equal always; unequal yield unequal often." This is the most important characteristic of all hash functions.
A high-quality hash function strives for some subset of the following virtues:
Hashing.sha512()
are
designed to make it as infeasible as possible to reverse-engineer the input that
produced a given hash code, or even to discover any two distinct inputs that
yield the same result. These are called cryptographic hash functions. But,
whenever it is learned that either of these feats has become computationally
feasible, the function is deemed "broken" and should no longer be used for secure
purposes. (This is the likely eventual fate of all cryptographic hashes.)
The primary way to provide the data that your hash function should act on is via a
Hasher
. Obtain a new hasher from the hash function using newHasher()
,
"push" the relevant data into it using methods like Hasher.putBytes(byte[])
,
and finally ask for the HashCode
when finished using Hasher.hash()
. (See
an example of this.)
If all you want to hash is a single byte array, string or long
value, there
are convenient shortcut methods defined directly on HashFunction
to make this
easier.
Hasher accepts primitive data types, but can also accept any Object of type T
provided that you implement a Funnel
to specify how to "feed" data
from that object into the function. (See an example of
this.)
Compatibility note: Throughout this API, multibyte values are always
interpreted in little-endian order. That is, hashing the byte array {0x01, 0x02, 0x03, 0x04}
is equivalent to hashing the int
value 0x04030201
. If this isn't what you need, methods such as Integer.reverseBytes(int)
and Ints.toByteArray(int)
will help.
Object.hashCode()
Java's baked-in concept of hash codes is constrained to 32 bits, and provides no
separation between hash algorithms and the data they act on, so alternate hash
algorithms can't be easily substituted. Also, implementations of hashCode
tend
to be poor-quality, in part because they end up depending on other existing
poor-quality hashCode
implementations, including those in many JDK classes.
Object.hashCode
implementations tend to be very fast, but have weak
collision prevention and no expectation of bit dispersion. This leaves them
perfectly suitable for use in hash tables, because extra collisions cause only a slight
performance hit, while poor bit dispersion is easily corrected using a secondary hash
function (which all reasonable hash table implementations in Java use). For the many
uses of hash functions beyond data structures, however, Object.hashCode
almost
always falls short -- hence this library.
Method Summary | |
---|---|
int |
bits()
Returns the number of bits (a multiple of 32) that each hash code produced by this hash function has. |
HashCode |
hashBytes(byte[] input)
Shortcut for newHasher().putBytes(input).hash() . |
HashCode |
hashBytes(byte[] input,
int off,
int len)
Shortcut for newHasher().putBytes(input, off, len).hash() . |
HashCode |
hashInt(int input)
Shortcut for newHasher().putInt(input).hash() ; returns the hash code for the given
int value, interpreted in little-endian byte order. |
HashCode |
hashLong(long input)
Shortcut for newHasher().putLong(input).hash() ; returns the hash code for the
given long value, interpreted in little-endian byte order. |
HashCode |
hashString(CharSequence input)
Shortcut for newHasher().putString(input).hash() . |
HashCode |
hashString(CharSequence input,
Charset charset)
Shortcut for newHasher().putString(input, charset).hash() . |
Hasher |
newHasher()
Begins a new hash code computation by returning an initialized, stateful Hasher instance that is ready to receive data. |
Hasher |
newHasher(int expectedInputSize)
Begins a new hash code computation as newHasher() , but provides a hint of the
expected size of the input (in bytes). |
Method Detail |
---|
Hasher newHasher()
Hasher
instance that is ready to receive data. Example: HashFunction hf = Hashing.md5();
HashCode hc = hf.newHasher()
.putLong(id)
.putString(name)
.hash();
Hasher newHasher(int expectedInputSize)
newHasher()
, but provides a hint of the
expected size of the input (in bytes). This is only important for non-streaming hash
functions (hash functions that need to buffer their whole input before processing any
of it).
HashCode hashInt(int input)
newHasher().putInt(input).hash()
; returns the hash code for the given
int
value, interpreted in little-endian byte order. The implementation might
perform better than its longhand equivalent, but should not perform worse.
HashCode hashLong(long input)
newHasher().putLong(input).hash()
; returns the hash code for the
given long
value, interpreted in little-endian byte order. The implementation
might perform better than its longhand equivalent, but should not perform worse.
HashCode hashBytes(byte[] input)
newHasher().putBytes(input).hash()
. The implementation
might perform better than its longhand equivalent, but should not perform
worse.
HashCode hashBytes(byte[] input, int off, int len)
newHasher().putBytes(input, off, len).hash()
. The implementation
might perform better than its longhand equivalent, but should not perform
worse.
IndexOutOfBoundsException
- if off < 0
or off + len > bytes.length
or len < 0
HashCode hashString(CharSequence input)
newHasher().putString(input).hash()
. The implementation might
perform better than its longhand equivalent, but should not perform worse. Note that no
character encoding is performed; the low byte and high byte of each character are hashed
directly (in that order). This is equivalent to using
hashString(input, Charsets.UTF_16LE)
.
HashCode hashString(CharSequence input, Charset charset)
newHasher().putString(input, charset).hash()
. Characters are encoded
using the given Charset
. The implementation might perform better than its
longhand equivalent, but should not perform worse.
int bits()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |