Class BloomFilter<T>

  • Type Parameters:
    T - the type of instances that the BloomFilter accepts
    All Implemented Interfaces:
    Predicate<T>, Serializable, Predicate<T>

    @Beta
    public final class BloomFilter<T>
    extends Object
    implements Predicate<T>, Serializable
    A Bloom filter for instances of T. A Bloom filter offers an approximate containment test with one-sided error: if it claims that an element is contained in it, this might be in error, but if it claims that an element is not contained in it, then this is definitely true.

    If you are unfamiliar with Bloom filters, this nice tutorial may help you understand how they work.

    The false positive probability (FPP) of a Bloom filter is defined as the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.

    Bloom filters are serializable. They also support a more compact serial representation via the writeTo(java.io.OutputStream) and readFrom(java.io.InputStream, com.google.common.hash.Funnel<? super T>) methods. Both serialized forms will continue to be supported by future versions of this library. However, serial forms generated by newer versions of the code may not be readable by older versions of the code (e.g., a serialized Bloom filter generated today may not be readable by a binary that was compiled 6 months ago).

    As of Guava 23.0, this class is thread-safe and lock-free. It internally uses atomics and compare-and-swap to ensure correctness when multiple threads are used to access it.

    Since:
    11.0 (thread-safe since 23.0)
    Author:
    Dimitris Andreou, Kevin Bourrillion
    See Also:
    Serialized Form
    • Method Detail

      • copy

        public BloomFilter<Tcopy()
        Creates a new BloomFilter that's a copy of this instance. The new instance is equal to this instance but shares no mutable state.
        Since:
        12.0
      • mightContain

        public boolean mightContain​(T object)
        Returns true if the element might have been put in this Bloom filter, false if this is definitely not the case.
      • apply

        @Deprecated
        public boolean apply​(T input)
        Deprecated.
        Provided only to satisfy the Predicate interface; use mightContain(T) instead.
        Description copied from interface: Predicate
        Returns the result of applying this predicate to input (Java 8 users, see notes in the class documentation above). This method is generally expected, but not absolutely required, to have the following properties:
        • Its execution does not cause any observable side effects.
        • The computation is consistent with equals; that is, Objects.equal(a, b) implies that predicate.apply(a) == predicate.apply(b)).
        Specified by:
        apply in interface Predicate<T>
      • put

        @CanIgnoreReturnValue
        public boolean put​(T object)
        Puts an element into this BloomFilter. Ensures that subsequent invocations of mightContain(Object) with the same element will always return true.
        Returns:
        true if the Bloom filter's bits changed as a result of this operation. If the bits changed, this is definitely the first time object has been added to the filter. If the bits haven't changed, this might be the first time object has been added to the filter. Note that put(t) always returns the opposite result to what mightContain(t) would have returned at the time it is called.
        Since:
        12.0 (present in 11.0 with void return type})
      • expectedFpp

        public double expectedFpp()
        Returns the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.

        Ideally, this number should be close to the fpp parameter passed in create(Funnel, int, double), or smaller. If it is significantly higher, it is usually the case that too many elements (more than expected) have been put in the BloomFilter, degenerating it.

        Since:
        14.0 (since 11.0 as expectedFalsePositiveProbability())
      • approximateElementCount

        public long approximateElementCount()
        Returns an estimate for the total number of distinct elements that have been added to this Bloom filter. This approximation is reasonably accurate if it does not exceed the value of expectedInsertions that was used when constructing the filter.
        Since:
        22.0
      • isCompatible

        public boolean isCompatible​(BloomFilter<T> that)
        Determines whether a given Bloom filter is compatible with this Bloom filter. For two Bloom filters to be compatible, they must:
        • not be the same instance
        • have the same number of hash functions
        • have the same bit size
        • have the same strategy
        • have equal funnels
        Parameters:
        that - The Bloom filter to check for compatibility.
        Since:
        15.0
      • putAll

        public void putAll​(BloomFilter<T> that)
        Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data. The mutations happen to this instance. Callers must ensure the Bloom filters are appropriately sized to avoid saturating them.
        Parameters:
        that - The Bloom filter to combine this Bloom filter with. It is not mutated.
        Throws:
        IllegalArgumentException - if isCompatible(that) == false
        Since:
        15.0
      • equals

        public boolean equals​(@Nullable Object object)
        Description copied from class: java.lang.Object
        Indicates whether some other object is "equal to" this one.

        The equals method implements an equivalence relation on non-null object references:

        • It is reflexive: for any non-null reference value x, x.equals(x) should return true.
        • It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
        • It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
        • It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
        • For any non-null reference value x, x.equals(null) should return false.

        The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).

        Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

        Specified by:
        equals in interface Predicate<T>
        Overrides:
        equals in class Object
        Parameters:
        object - the reference object with which to compare.
        Returns:
        true if this object is the same as the obj argument; false otherwise.
        See Also:
        Object.hashCode(), HashMap
      • hashCode

        public int hashCode()
        Description copied from class: java.lang.Object
        Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

        The general contract of hashCode is:

        • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
        • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
        • It is not required that if two objects are unequal according to the Object.equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

        As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (The hashCode may or may not be implemented as some function of an object's memory address at some point in time.)

        Overrides:
        hashCode in class Object
        Returns:
        a hash code value for this object.
        See Also:
        Object.equals(java.lang.Object), System.identityHashCode(java.lang.Object)
      • toBloomFilter

        public static <T> Collector<T,​?,​BloomFilter<T>> toBloomFilter​(Funnel<? super T> funnel,
                                                                                  long expectedInsertions)
        Returns a Collector expecting the specified number of insertions, and yielding a BloomFilter with false positive probability 3%.

        Note that if the Collector receives significantly more elements than specified, the resulting BloomFilter will suffer a sharp deterioration of its false positive probability.

        The constructed BloomFilter will be serializable if the provided Funnel<T> is.

        It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

        Parameters:
        funnel - the funnel of T's that the constructed BloomFilter will use
        expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
        Returns:
        a Collector generating a BloomFilter of the received elements
        Since:
        23.0
      • toBloomFilter

        public static <T> Collector<T,​?,​BloomFilter<T>> toBloomFilter​(Funnel<? super T> funnel,
                                                                                  long expectedInsertions,
                                                                                  double fpp)
        Returns a Collector expecting the specified number of insertions, and yielding a BloomFilter with the specified expected false positive probability.

        Note that if the Collector receives significantly more elements than specified, the resulting BloomFilter will suffer a sharp deterioration of its false positive probability.

        The constructed BloomFilter will be serializable if the provided Funnel<T> is.

        It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

        Parameters:
        funnel - the funnel of T's that the constructed BloomFilter will use
        expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
        fpp - the desired false positive probability (must be positive and less than 1.0)
        Returns:
        a Collector generating a BloomFilter of the received elements
        Since:
        23.0
      • create

        public static <T> BloomFilter<T> create​(Funnel<? super T> funnel,
                                                int expectedInsertions,
                                                double fpp)
        Creates a BloomFilter with the expected number of insertions and expected false positive probability.

        Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

        The constructed BloomFilter will be serializable if the provided Funnel<T> is.

        It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

        Parameters:
        funnel - the funnel of T's that the constructed BloomFilter will use
        expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
        fpp - the desired false positive probability (must be positive and less than 1.0)
        Returns:
        a BloomFilter
      • create

        public static <T> BloomFilter<T> create​(Funnel<? super T> funnel,
                                                long expectedInsertions,
                                                double fpp)
        Creates a BloomFilter with the expected number of insertions and expected false positive probability.

        Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

        The constructed BloomFilter will be serializable if the provided Funnel<T> is.

        It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

        Parameters:
        funnel - the funnel of T's that the constructed BloomFilter will use
        expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
        fpp - the desired false positive probability (must be positive and less than 1.0)
        Returns:
        a BloomFilter
        Since:
        19.0
      • create

        public static <T> BloomFilter<T> create​(Funnel<? super T> funnel,
                                                int expectedInsertions)
        Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.

        Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

        The constructed BloomFilter will be serializable if the provided Funnel<T> is.

        It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

        Parameters:
        funnel - the funnel of T's that the constructed BloomFilter will use
        expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
        Returns:
        a BloomFilter
      • create

        public static <T> BloomFilter<T> create​(Funnel<? super T> funnel,
                                                long expectedInsertions)
        Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.

        Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

        The constructed BloomFilter will be serializable if the provided Funnel<T> is.

        It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

        Parameters:
        funnel - the funnel of T's that the constructed BloomFilter will use
        expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
        Returns:
        a BloomFilter
        Since:
        19.0
      • readFrom

        public static <T> BloomFilter<T> readFrom​(InputStream in,
                                                  Funnel<? super T> funnel)
                                           throws IOException
        Reads a byte stream, which was written by writeTo(OutputStream), into a BloomFilter.

        The Funnel to be used is not encoded in the stream, so it must be provided here. Warning: the funnel provided must behave identically to the one used to populate the original Bloom filter!

        Throws:
        IOException - if the InputStream throws an IOException, or if its data does not appear to be a BloomFilter serialized using the writeTo(OutputStream) method.