org.apache.crunch.lib.join
Class MapsideJoin
java.lang.Object
org.apache.crunch.lib.join.MapsideJoin
public class MapsideJoin
- extends Object
Utility for doing map side joins on a common key between two PTable
s.
A map side join is an optimized join which doesn't use a reducer; instead,
the right side of the join is loaded into memory and the join is performed in
a mapper. This style of join has the important implication that the output of
the join is not sorted, which is the case with a conventional (reducer-based)
join.
Note:This utility is only supported when running with a
MRPipeline
as the pipeline.
Method Summary |
static
|
join(PTable<K,U> left,
PTable<K,V> right)
Join two tables using a map side join. |
MapsideJoin
public MapsideJoin()
join
public static <K,U,V> PTable<K,Pair<U,V>> join(PTable<K,U> left,
PTable<K,V> right)
- Join two tables using a map side join. The right-side table will be loaded
fully in memory, so this method should only be used if the right side
table's contents can fit in the memory allocated to mappers. The join
performed by this method is an inner join.
- Parameters:
left
- The left-side table of the joinright
- The right-side table of the join, whose contents will be fully
read into memory
- Returns:
- A table keyed on the join key, containing pairs of joined values
Copyright © 2012 The Apache Software Foundation. All Rights Reserved.