HTreeHashJoinOp (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- com.bigdata.bop.CoreBaseBOp
- - com.bigdata.bop.BOpBase
  - - com.bigdata.bop.PipelineOp
    - - com.bigdata.bop.join.HashJoinOp<E>
      - com.bigdata.bop.join.HTreeHashJoinOp<E>

All Implemented Interfaces:

BOp, IShardwisePipelineOp<E>, ISingleThreadedOp, IPropertySet, Serializable, Cloneable
```
public class HTreeHashJoinOp<E>
extends HashJoinOp<E>
implements ISingleThreadedOp
```
A hash join against an IAccessPath based on the HTree and suitable for very large intermediate result sets. Source solutions are buffered on the HTree on each evaluation pass. When the memory demand of the HTree is not bounded, the hash join will run a single pass over the IAccessPath for the target IPredicate. For some queries, this can be more efficient than probing as-bound instances of the target IPredicate using a nested indexed join, such as PipelineJoin. This can also be more efficient on a cluster where the key range scan of the target IPredicate will be performed using predominately sequential IO.
If the PipelineOp.Annotations#MAX_MEMORY annotation is specified then an evaluation pass over the target IAccessPath will be triggered if, after having buffered some chunk of solutions on the HTree, the memory demand of the HTree exceeds the capacity specified by that annotation. This "blocked" evaluation trades off multiple scans of the target IPredicate against the memory demand of the intermediate result set.
The source solutions presented to a hash join MUST have bindings for the HashJoinAnnotations.JOIN_VARS in order to join (they can still succeed as optionals if the join variables are not bound).
Handling OPTIONAL
An optional join makes life significantly more complex. For each source solution we need to know whether or not it joined at least once with the access path. A join can only occur when the source solution and the access path have the same as-bound values for the join variables. However, the same as-bound values can appear multiple times when scanning an access path, even if the access path does not allow duplicates. For example, an SPO index scan can many tuples with the same O. This means that we can not simply remove source solution when they join as they might join more than once.
While it is easy enough to associate a flag or counter with each source solution when running on the JVM heap, updating that flag or counter when the data are on a persistent index is more expensive. Another approach is to build up a second hash index (a "join set") of the solutions which joined and then do a scan over the original hash index, writing out any solution which is not in the joinSet. This is also expensive since we could wind up double buffering the source solutions. Both approaches also require us to scan the total multiset of the source solutions in order to detect and write out any optional solutions. I've gone with the joinSet approach here as it reduces the complexity associated with update of a per-solution counter in the hash index.
Finally, note that "blocked" evaluation is not possible with OPTIONAL because we must have ALL solutions on hand in order to decide which solutions did not join. Therefore PipelineOp.Annotations#MAX_MEMORY must be set to Long.MAX_VALUE when the IPredicate is IPredicate.Annotations#OPTIONAL.

Author:

Bryan Thompson

See Also:
HTreeHashJoinUtility, Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static interface HTreeHashJoinOp.Annotations

Nested Classes
Modifier and Type	Class and Description
`static interface`	`HTreeHashJoinOp.Annotations`

Field Summary
- Fields inherited from class com.bigdata.bop.CoreBaseBOp
  DEFAULT_INITIAL_CAPACITY
- Fields inherited from interface com.bigdata.bop.BOp
  NOANNS, NOARGS

Constructor Summary

Constructors
Constructor and Description
`HTreeHashJoinOp(BOp[] args, Map<String,Object> annotations)`
`HTreeHashJoinOp(BOp[] args, NV... annotations)`
`HTreeHashJoinOp(HTreeHashJoinOp<E> op)`

Method Summary

Methods
Modifier and Type	Method and Description
`protected IHashJoinUtility`	`newState(BOpContext<IBindingSet> context, INamedSolutionSetRef namedSetRef, JoinTypeEnum joinType)` Return the instance of the `IHashJoinUtility` to be used by this operator.
`protected boolean`	`runHashJoin(BOpContext<?> context, IHashJoinUtility state)` Return `true` if `ChunkTask#doHashJoin()` should be executed in a given operator `ChunkTask` invocation.

Methods inherited from class com.bigdata.bop.join.HashJoinOp
eval, getPredicate, isOptional, newStats

Methods inherited from class com.bigdata.bop.PipelineOp
assertAtOnceJavaHeapOp, assertMaxParallelOne, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getMaxMemory, getMaxParallel, isAtOnceEvaluation, isBlockedEvaluation, isLastPassRequested, isPipelinedEvaluation, isReorderSolutions, isSharedState

Methods inherited from class com.bigdata.bop.BOpBase
__replaceArg, _clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray

Methods inherited from class com.bigdata.bop.CoreBaseBOp
annotationsEqual, annotationsToString, annotationsToString, annotationValueToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, mutation, shortenName, toShortString, toString, toString

Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - HTreeHashJoinOp
```
public HTreeHashJoinOp(HTreeHashJoinOp<E> op)
```
    Parameters:
    op -
  - HTreeHashJoinOp
```
public HTreeHashJoinOp(BOp[] args,
               NV... annotations)
```
  - HTreeHashJoinOp
```
public HTreeHashJoinOp(BOp[] args,
               Map<String,Object> annotations)
```
    Parameters:
    args -
    annotations -
- Method Detail
  - newState
```
protected IHashJoinUtility newState(BOpContext<IBindingSet> context,
                        INamedSolutionSetRef namedSetRef,
                        JoinTypeEnum joinType)
```
    Description copied from class: HashJoinOp
    
    Return the instance of the IHashJoinUtility to be used by this operator. This method is invoked once, the first time this operator is evaluated. The returned IHashJoinUtility reference is attached to the IQueryAttributes and accessed there on subsequent evaluation passes for this operator.
    
    Specified by:
    
    newState in class HashJoinOp<E>
    
    Parameters:
    context - The BOpEvaluationContext
    namedSetRef - Metadata to identify the named solution set.
    joinType - The type of join.
  - runHashJoin
```
protected boolean runHashJoin(BOpContext<?> context,
                  IHashJoinUtility state)
```
    Return true if ChunkTask#doHashJoin() should be executed in a given operator ChunkTask invocation.
    The HTreeHashJoinOp runs the hash join either exactly once (at-once evaluation) or once a target memory threshold has been exceeded (blocked evaluation).
    
    Specified by:
    
    runHashJoin in class HashJoinOp<E>
    
    Parameters:
    context - The operator evaluation context.
    state - The IHashJoinUtility instance.

Class HTreeHashJoinOp<E>

Handling OPTIONAL

Nested Class Summary

Field Summary

Fields inherited from class com.bigdata.bop.CoreBaseBOp

Fields inherited from interface com.bigdata.bop.BOp

Constructor Summary

Method Summary

Methods inherited from class com.bigdata.bop.join.HashJoinOp

Methods inherited from class com.bigdata.bop.PipelineOp

Methods inherited from class com.bigdata.bop.BOpBase

Methods inherited from class com.bigdata.bop.CoreBaseBOp

Methods inherited from class java.lang.Object

Constructor Detail

HTreeHashJoinOp

HTreeHashJoinOp

HTreeHashJoinOp

Method Detail

newState

runHashJoin