com.google.common.math.Quantiles

@GwtIncompatible public final class Quantiles extends Object

Provides a fluent API for calculating quantiles.

Examples

To compute the median:

double myMedian = median().compute(myDataset);

where median() has been statically imported.

To compute the 99th percentile:

double myPercentile99 = percentiles().index(99).compute(myDataset);

where percentiles() has been statically imported.

To compute median and the 90th and 99th percentiles:

Map<Integer, Double> myPercentiles =
    percentiles().indexes(50, 90, 99).compute(myDataset);

where percentiles() has been statically imported: myPercentiles maps the keys 50, 90, and 99, to their corresponding quantile values.

To compute quartiles, use quartiles() instead of percentiles(). To compute arbitrary q-quantiles, use scale(q).

These examples all take a copy of your dataset. If you have a double array, you are okay with it being arbitrarily reordered, and you want to avoid that copy, you can use computeInPlace instead of compute.

Definition and notes on interpolation

The definition of the kth q-quantile of N values is as follows: define x = k * (N - 1) / q; if x is an integer, the result is the value which would appear at index x in the sorted dataset (unless there are NaN values, see below); otherwise, the result is the average of the values which would appear at the indexes floor(x) and ceil(x) weighted by (1-frac(x)) and frac(x) respectively. This is the same definition as used by Excel and by S, it is the Type 7 definition in R, and it is described by wikipedia as providing "Linear interpolation of the modes for the order statistics for the uniform distribution on [0,1]."

Handling of non-finite values

If any values in the input are NaN then all values returned are NaN. (This is the one occasion when the behaviour is not the same as you'd get from sorting with Arrays.sort(double[]) or Collections.sort(List<Double>) and selecting the required value(s). Those methods would sort NaN as if it is greater than any other value and place them at the end of the dataset, even after POSITIVE_INFINITY.)

Otherwise, NEGATIVE_INFINITY and POSITIVE_INFINITY sort to the beginning and the end of the dataset, as you would expect.

If required to do a weighted average between an infinity and a finite value, or between an infinite value and itself, the infinite value is returned. If required to do a weighted average between NEGATIVE_INFINITY and POSITIVE_INFINITY, NaN is returned (note that this will only happen if the dataset contains no finite values).

Performance

The average time complexity of the computation is O(N) in the size of the dataset. There is a worst case time complexity of O(N^2). You are extremely unlikely to hit this quadratic case on randomly ordered data (the probability decreases faster than exponentially in N), but if you are passing in unsanitized user data then a malicious user could force it. A light shuffle of the data using an unpredictable seed should normally be enough to thwart this attack.

The time taken to compute multiple quantiles on the same dataset using indexes is generally less than the total time taken to compute each of them separately, and sometimes much less. For example, on a large enough dataset, computing the 90th and 99th percentiles together takes about 55% as long as computing them separately.

When calling Quantiles.ScaleAndIndex.compute(Collection) (in either form), the memory requirement is 8*N bytes for the copy of the dataset plus an overhead which is independent of N (but depends on the quantiles being computed). When calling computeInPlace (in either form), only the overhead is required. The number of object allocations is independent of N in both cases.

Since:: 20.0
Author:: Pete Gillin

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final class

Quantiles.Scale

Describes the point in a fluent API chain where only the scale (i.e. the q in q-quantiles) has been specified.

static final class

Quantiles.ScaleAndIndex

Describes the point in a fluent API chain where the scale and a single quantile index (i.e. the q and the k in the kth q-quantile) have been specified.

static final class

Quantiles.ScaleAndIndexes

Describes the point in a fluent API chain where the scale and a multiple quantile indexes (i.e. the q and a set of values for the k in the kth q-quantile) have been specified.
Constructor Summary

Constructors

Constructor

Description

Quantiles()

Deprecated.
Use the static factory methods of the class.
Method Summary

Modifier and Type

Method

Description

static Quantiles.ScaleAndIndex

median()

Specifies the computation of a median (i.e. the 1st 2-quantile).

static Quantiles.Scale

percentiles()

Specifies the computation of percentiles (i.e. 100-quantiles).

static Quantiles.Scale

quartiles()

Specifies the computation of quartiles (i.e. 4-quantiles).

static Quantiles.Scale

scale(int scale)

Specifies the computation of q-quantiles.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Quantiles
  
  @Deprecated public Quantiles()
  
  Deprecated.
  Use the static factory methods of the class. There is no reason to create an instance of Quantiles.
  
  Constructor for a type that is not meant to be instantiated.
Method Details
- median
  
  public static Quantiles.ScaleAndIndex median()
  
  Specifies the computation of a median (i.e. the 1st 2-quantile).
- quartiles
  
  public static Quantiles.Scale quartiles()
  
  Specifies the computation of quartiles (i.e. 4-quantiles).
- percentiles
  
  public static Quantiles.Scale percentiles()
  
  Specifies the computation of percentiles (i.e. 100-quantiles).
- scale
  
  public static Quantiles.Scale scale(int scale)
  
  Specifies the computation of q-quantiles.
  
  Parameters:
  
  scale - the scale for the quantiles to be calculated, i.e. the q of the q-quantiles, which must be positive

Class Quantiles

Examples

Definition and notes on interpolation

Handling of non-finite values

Performance

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Details

Quantiles

Method Details

median

quartiles

percentiles

scale