Package cern.jet.stat.quantile

Scalable algorithms and data structures to compute approximate quantiles over very large data sequences.

See: Description

Package cern.jet.stat.quantile Description

Scalable algorithms and data structures to compute approximate quantiles over very large data sequences. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than for any other known technique by an order of magnitude.

The approx. algorithms are primarily intended to help applications scale. When faced with a large data sequence, traditional methods either need very large memories or time consuming disk based sorting. In constrast, the approx. algorithms can deal with > 10^10 values without disk based sorting.

All classes can be seen from various angles, for example as

1. Algorithm to compute quantiles.
2. 1-dim-equi-depth histogram.
3. 1-dim-histogram arbitrarily rebinnable in real-time.
4. A space efficient MultiSet data structure using lossy compression.
5. A space efficient value preserving bin of a 2-dim or d-dim histogram.
(All subject to an accuracy specified by the user.) Have a look at the documentation of class QuantileFinderFactory and the interface DoubleQuantileFinder to learn more. Most users will never need to know more than how to use these. Actual implementations of the QuantileFinder interface are hidden. They are indirectly constructed via the the factory.
Also see hep.aida.bin.QuantileBin1D, demonstrating how this package can be used.

Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.