CBZip2InputStream (Apache Crunch 0.3.0-incubating API)

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.io.text
Class CBZip2InputStream

java.lang.Object
  java.io.InputStream
      org.apache.crunch.io.text.CBZip2InputStream

All Implemented Interfaces:: Closeable, org.apache.hadoop.io.compress.bzip2.BZip2Constants

public class CBZip2InputStream
extends InputStream
implements org.apache.hadoop.io.compress.bzip2.BZip2Constants
extends InputStream
implements org.apache.hadoop.io.compress.bzip2.BZip2Constants

An input stream that decompresses from the BZip2 format (without the file header chars) to be read as any other stream.

Author:: Keiron Liddle

Field Summary

Fields inherited from interface org.apache.hadoop.io.compress.bzip2.BZip2Constants
`baseBlockSize, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB`

Constructor Summary
`CBZip2InputStream(org.apache.hadoop.fs.FSDataInputStream zStream, int blockSize, long end)`

Method Summary
`long`	`getPos()` getPos is used by the caller to know when the processing of the current `InputSplit` is complete.
`long`	`getReadCount()`
`long`	`getReadLimit()`
`int`	`read()`
`void`	`setReadLimit(long readLimit)`

Methods inherited from class java.io.InputStream
`available, close, mark, markSupported, read, read, reset, skip`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

CBZip2InputStream

public CBZip2InputStream(org.apache.hadoop.fs.FSDataInputStream zStream,
                         int blockSize,
                         long end)
                  throws IOException

Throws:: IOException

Method Detail

getReadLimit

public long getReadLimit()

setReadLimit

public void setReadLimit(long readLimit)

getReadCount

public long getReadCount()

read

public int read()
         throws IOException

Specified by:: read in class InputStream

Throws:: IOException

getPos

public long getPos()
            throws IOException

getPos is used by the caller to know when the processing of the current InputSplit is complete. In this method, as we read each bzip block, we keep returning the beginning of the InputSplit as the return value until we hit a block which starts at a position >= end of current split. At that point we should set up retpos such that after a record is read, future getPos() calls will get a value > end of current split - this way we will read only one record out of that bzip block - the rest of the records from that bzip block should be read by the next map task while processing the next split

Returns:
Throws:: IOException