public class HTreeHashJoinOp<E> extends HashJoinOp<E> implements ISingleThreadedOp
IAccessPathbased on the
HTreeand suitable for very large intermediate result sets. Source solutions are buffered on the
HTreeon each evaluation pass. When the memory demand of the
HTreeis not bounded, the hash join will run a single pass over the
IAccessPathfor the target
IPredicate. For some queries, this can be more efficient than probing as-bound instances of the target
IPredicateusing a nested indexed join, such as
PipelineJoin. This can also be more efficient on a cluster where the key range scan of the target
IPredicatewill be performed using predominately sequential IO.
PipelineOp.Annotations#MAX_MEMORY annotation is specified then
an evaluation pass over the target
IAccessPath will be triggered if,
after having buffered some chunk of solutions on the
memory demand of the
HTree exceeds the capacity specified by that
annotation. This "blocked" evaluation trades off multiple scans of the target
IPredicate against the memory demand of the intermediate result set.
The source solutions presented to a hash join MUST have bindings for the
HashJoinAnnotations.JOIN_VARS in order to join (they can still
succeed as optionals if the join variables are not bound).
While it is easy enough to associate a flag or counter with each source solution when running on the JVM heap, updating that flag or counter when the data are on a persistent index is more expensive. Another approach is to build up a second hash index (a "join set") of the solutions which joined and then do a scan over the original hash index, writing out any solution which is not in the joinSet. This is also expensive since we could wind up double buffering the source solutions. Both approaches also require us to scan the total multiset of the source solutions in order to detect and write out any optional solutions. I've gone with the joinSet approach here as it reduces the complexity associated with update of a per-solution counter in the hash index.
Finally, note that "blocked" evaluation is not possible with OPTIONAL because
we must have ALL solutions on hand in order to decide which solutions did not
PipelineOp.Annotations#MAX_MEMORY must be set to
Long.MAX_VALUE when the
|Modifier and Type||Class and Description|
|Constructor and Description|
|Modifier and Type||Method and Description|
Return the instance of the
eval, getPredicate, isOptional, newStats
assertAtOnceJavaHeapOp, assertMaxParallelOne, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getMaxMemory, getMaxParallel, isAtOnceEvaluation, isBlockedEvaluation, isLastPassRequested, isPipelinedEvaluation, isReorderSolutions, isSharedState
__replaceArg, _clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray
annotationsEqual, annotationsToString, annotationsToString, annotationValueToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, mutation, shortenName, toShortString, toString, toString
protected IHashJoinUtility newState(BOpContext<IBindingSet> context, INamedSolutionSetRef namedSetRef, JoinTypeEnum joinType)
IHashJoinUtilityto be used by this operator. This method is invoked once, the first time this operator is evaluated. The returned
IHashJoinUtilityreference is attached to the
IQueryAttributesand accessed there on subsequent evaluation passes for this operator.
protected boolean runHashJoin(BOpContext<?> context, IHashJoinUtility state)
ChunkTask#doHashJoin()should be executed in a given operator
HTreeHashJoinOp runs the hash join either exactly once
(at-once evaluation) or once a target memory threshold has been exceeded
Copyright © 2006–2016 SYSTAP, LLC DBA Blazegraph. All rights reserved.