org.apache.crunch.lib
Class SecondarySort
java.lang.Object
org.apache.crunch.lib.SecondarySort
public class SecondarySort
- extends Object
Utilities for performing a secondary sort on a PTable<K, Pair<V1, V2>>
collection.
Secondary sorts are usually performed during sessionization: given a collection
of events, we want to group them by a key (such as a user ID), then sort the grouped
records by an auxillary key (such as a timestamp), and then perform some additional
processing on the sorted records.
Method Summary |
static
|
sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static
|
sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
SecondarySort
public SecondarySort()
sortAndApply
public static <K,V1,V2,T> PCollection<T> sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
- Perform a secondary sort on the given
PTable
instance and then apply a
DoFn
to the resulting sorted data to yield an output PCollection<T>
.
sortAndApply
public static <K,V1,V2,U,V> PTable<U,V> sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
- Perform a secondary sort on the given
PTable
instance and then apply a
DoFn
to the resulting sorted data to yield an output PTable<U, V>
.
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.