com.bigdata.sparse (Blazegraph Database Platform 2.1.5 API)

Interface Summary
Interface	Description
IAutoIncrementCounter	Marker interface for auto-incremental types.
INameFilter	Filter used to select column names.
IPrecondition	An inteface that may be used to impose a pre-condition on the state of a logical row for an atomic write operation.
IRowStoreConstants	Various constants that may be used with the `SparseRowStore`.
ITPS	A Timestamp Property Set is a property set with `timestamp property values` representing data for a specific `Schema`.
ITPV	a Timestamped Property Value is a single {property, timestamp, value} tuple for some schema as read from the `SparseRowStore`.
SparseRowStore.Options	Options for the `SparseRowStore`.

Class Summary
Class	Description
AbstractAtomicRowReadOrWrite	Abstract class implements the atomic read operation.
AtomicRowDelete	Atomic delete of a logical row.
AtomicRowFilter	Transforms an `ITupleIterator` reading directly on an `IIndex` backing a `SparseRowStore` into an `ITupleIterator` visiting logical `ITPS` rows.
AtomicRowRead	Atomic read of the logical row associated with some `Schema` and primary key.
AtomicRowWriteRead	Atomic write on a logical row.
AutoIncIntegerCounter	A singleton object that causes the associated property value to be assigned the next higher 32-bit integer value when it is written on the `SparseRowStore`.
AutoIncLongCounter	A singleton object that causes the associated property value to be assigned the next higher 64-bit integer value when it is written on the `SparseRowStore`.
EmptyRowPrecondition	`IPrecondition` succeeds iff there are no property values for the logical row (it recognizes a `null`, indicating no property values, and an empty logical row, indicating that an `INameFilter` was applied and that there were no property values which satisified that filter, and a deleted property value for the primary key, which is often used as a shorthand for deleting the logical row).
GlobalRowStoreHelper	Helper class.
GlobalRowStoreSchema	Schema for the `SparseRowStore` used as a global row store for named property sets within the federation.
GlobalRowStoreUtil	Utility method for use with a `SparseRowStore`.
KeyDecoder	A utility class that decodes a key in a `SparseRowStore` into the `KeyType` for the primary key, the column name, and the timestamp.
LogicalRowSplitHandler	This class imposes the constraint that the separator key must not split a logical.
NameChecker	Utility class validates column and schema name constraints.
Precondition	Base class may be used for combining `IPrecondition`.
Schema	A schema for a sparse row store.
SingleColumnFilter	Filter for a specific column name.
SparseRowStore	A client-side class that knows how to use an `IIndex` to provide an efficient data model in which a logical row is stored as one or more entries in the `IIndex`.
TestAll	Test suite for the spare row store facility (aka key-value store).
TestKeyEncodeDecode	Test suite for round trip of keys as encoded by `Schema.fromKey(com.bigdata.btree.keys.IKeyBuilder, Object)`, by `AtomicRowWriteRead`, and as decoded by the `KeyDecoder`.
TestSparseRowStore	Test suite for `SparseRowStore`.
TestTPS	Test of `TPS` (property timestamp set implementation).
TestValueType	Test suite for `ValueType`.
TimestampChooser	Utility class for choosing timestamps for the `SparseRowStore` on the server.
TPS	Default implementation.
TPS.TP	A {property, timestamp} tuple.
TPS.TPV	Helper class models a single property value as of a given timestamp.
TPS.TPVComparator	Imposes ordering based on schema, property name, and timestamp.
TPSTupleSerializer	Helper class for (de-)serializing logical rows for `AtomicRowFilter`.

Enum Summary
Enum	Description
KeyType	A type safe enumeration of key types and the byte values that are used to encode that key type within the encoded `Schema` name.
ValueType	A type safe enumeration of value types.

Package com.bigdata.sparse Description

This package provides support for treating normal B+Trees using a "sparse row store" pattern and can be applied to both local B+Trees and scale-out indices. A sparse row store is a data model in which the B+Tree keys are formed as:


[schemaName][primaryKey][columnName][timestamp]

and the values are the value for a given column for that primary key. People familar with the Google "bigtable" project will recognize this data model.

Sparse row stores are flexible, scaleable and highly efficient data structures. A single logical "row" is made up of a key range spanning all entries within a given schema and having the same primary key - when multiple schemas are used, the facet of the row for each schema must be read explicitly, but this operation can occur in parallel. They allow applications to cluster data by schema and by primary key, and make it easy to store only the non-null values in a given "row". Likewise, on retrieval the application can ask for all columns for a given row or only those satisifying some constraint.

Timestamps are either generated by the application, in which case they define the semantics of a write-write conflict, or on write by the index. In the latter case, write-write conflicts never arise. Regardless of how timestamps are generated, the use of the timestamp in the key requires that applications specify filters that are applied during row scans to limit the data points actually returned as part of the row. For example, only returning the most recent column values no later than a given timestamp for all columns for some primary key.

For example, assuming records with the following columns

Id
Name
Employer
DateOfHire

would be represented as a series of index entries as follows:


[employee][12][DateOfHire][t0] : [4/30/02]
[employee][12][DateOfHire][t1] : [4/30/05]
[employee][12][Employer][t0]   : [SAIC]
[employee][12][Employer][t1]   : [SYSTAP]
[employee][12][Id][t0]         : [12]
[employee][12][Name][t0]       : [Bryan Thompson]

The records have been grouped under a schema named "employee". All records have the same primary key [12], which is also the value of the Id column. The entries with timestamp t0 represent the 1st logical row and specify the date of hire at SAIC. The entries with timestamp t1 are the modified column values (unmodified column values are not rewritten) representing the date of hire at SYSTAP - the Id and Name column values from t1 are unchanged. Note that all entries share a common prefix which would be compressed down when it is stored in the indices.

There are two logical rows in the example above

Name = Bryan Thompson, Id = 12, DateOfHire = 4/30/02, Employer = SAIC
Name = Bryan Thompson, Id = 12, DateOfHire = 4/30/05, Employer = SYSTAP

A query for all columns with the primary key "12" and the "employee" scheme would result in a row scan whose starting key was [employee][12]. If you want only the most current row as of a given timestamp, then you would specify a filter for the row scan that only accepted the most current entries for any column value having a timestamp not greater than the given timestamp for any column name.

The sparse row store is capable of storing historical revisions efficiently, but if too many revisions are stored for a given primary key then retrieval performance for logical rows for that key will eventually degrade. In order to maintain performance for rows that will be updated frequently, a history policy should be periodically applied to expunge old data from the indices. Typical policies will either keep only a limited number of revisions of a row, e.g., the last 3 revisions, or all revisions within a certain number of hours or days. Since only modified column values are stored, care must be taken to expunge only column values that (a) fail the history policy; and (b) have been overwritten. When using partitioned indices, the history policy is applied during compaction (when the segments that comprise a partition are merged).

The sparse row store provides a guarentee of atomic read and writes at the logical "row" level. A logical row corresponds to the (ordered) set of entries in an index having a common schema and primary key. This guarentee of atomic row operations is achieved together with very high concurrent read and write rates by imposing the following constraints:

The client uses a unisolated reads and writes to communicate with the data services.
The client never splits read or write operations across a [schema][primaryKey] prefix. More typically, the client restricts read and write operations to a single logical row at a time. Together with the use of unisolated operations this guarentees row level atomicity of read and write operations on a given index partition.
When using a partitioned index, the data services never split across a [schema][primaryKey] prefix. In order to enforce this constraint (and in order to support auto-timestamps), a partitioned index must be provisioned specifically for a sparse row store. Likewise, per-schema provisioning may also be used to give guidence on the latency or redundency requirements for a given schema.