org.apache.crunch.io.text
Class CBZip2InputStream
java.lang.Object
java.io.InputStream
org.apache.crunch.io.text.CBZip2InputStream
- All Implemented Interfaces:
- Closeable, org.apache.hadoop.io.compress.bzip2.BZip2Constants
public class CBZip2InputStream
- extends InputStream
- implements org.apache.hadoop.io.compress.bzip2.BZip2Constants
An input stream that decompresses from the BZip2 format (without the file
header chars) to be read as any other stream.
- Author:
- Keiron Liddle
Fields inherited from interface org.apache.hadoop.io.compress.bzip2.BZip2Constants |
baseBlockSize, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB |
Constructor Summary |
CBZip2InputStream(org.apache.hadoop.fs.FSDataInputStream zStream,
int blockSize,
long end)
|
CBZip2InputStream
public CBZip2InputStream(org.apache.hadoop.fs.FSDataInputStream zStream,
int blockSize,
long end)
throws IOException
- Throws:
IOException
getReadLimit
public long getReadLimit()
setReadLimit
public void setReadLimit(long readLimit)
getReadCount
public long getReadCount()
read
public int read()
throws IOException
- Specified by:
read
in class InputStream
- Throws:
IOException
getPos
public long getPos()
throws IOException
- getPos is used by the caller to know when the processing of the current
InputSplit
is complete. In this method, as we read each bzip block,
we keep returning the beginning of the InputSplit
as the return
value until we hit a block which starts at a position >= end of current
split. At that point we should set up retpos such that after a record is
read, future getPos() calls will get a value > end of current split - this
way we will read only one record out of that bzip block - the rest of the
records from that bzip block should be read by the next map task while
processing the next split
- Returns:
-
- Throws:
IOException
Copyright © 2012 The Apache Software Foundation. All Rights Reserved.