|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object java.io.InputStream java.io.FilterInputStream org.apache.tika.io.ProxyInputStream org.apache.tika.io.TaggedInputStream org.apache.tika.io.TikaInputStream
public class TikaInputStream
Input stream with extended capabilities. The purpose of this class is
to allow files and other resources and information to be associated with
the InputStream
instance passed through the
Parser
interface and other similar APIs.
TikaInputStream instances can be created using the various static
get()
factory methods. Most of these methods take an optional
Metadata
argument that is then filled with the available input
metadata from the given resource. The created TikaInputStream instance
keeps track of the original resource used to create it, while behaving
otherwise just like a normal, buffered InputStream
.
A TikaInputStream instance is also guaranteed to support the
mark(int)
feature.
Code that wants to access the underlying file or other resources
associated with a TikaInputStream should first use the
get(InputStream)
factory method to cast or wrap a given
InputStream
into a TikaInputStream instance.
Field Summary |
---|
Fields inherited from class java.io.FilterInputStream |
---|
in |
Method Summary | |
---|---|
protected void |
afterRead(int n)
Invoked by the read methods after the proxied call has returned successfully. |
static TikaInputStream |
cast(java.io.InputStream stream)
Returns the given stream casts to a TikaInputStream, or null if the stream is not a TikaInputStream. |
void |
close()
Invokes the delegate's close() method. |
static TikaInputStream |
get(java.sql.Blob blob)
Creates a TikaInputStream from the given database BLOB. |
static TikaInputStream |
get(java.sql.Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB. |
static TikaInputStream |
get(byte[] data)
Creates a TikaInputStream from the given array of bytes. |
static TikaInputStream |
get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes. |
static TikaInputStream |
get(java.io.File file)
Creates a TikaInputStream from the given file. |
static TikaInputStream |
get(java.io.File file,
Metadata metadata)
Creates a TikaInputStream from the given file. |
static TikaInputStream |
get(java.io.InputStream stream)
Casts or wraps the given stream to a TikaInputStream instance. |
static TikaInputStream |
get(java.io.InputStream stream,
TemporaryFiles tmp)
Deprecated. Use the get(InputStream, TemporaryResources) instead |
static TikaInputStream |
get(java.io.InputStream stream,
TemporaryResources tmp)
Casts or wraps the given stream to a TikaInputStream instance. |
static TikaInputStream |
get(java.net.URI uri)
Creates a TikaInputStream from the resource at the given URI. |
static TikaInputStream |
get(java.net.URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI. |
static TikaInputStream |
get(java.net.URL url)
Creates a TikaInputStream from the resource at the given URL. |
static TikaInputStream |
get(java.net.URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL. |
java.io.File |
getFile()
|
java.nio.channels.FileChannel |
getFileChannel()
|
long |
getLength()
Returns the length (in bytes) of this stream. |
java.lang.Object |
getOpenContainer()
Returns the open container object, such as a POIFS FileSystem in the event of an OLE2 document being detected and processed by the OLE2 detector. |
long |
getPosition()
Returns the current position within the stream. |
boolean |
hasFile()
|
boolean |
hasLength()
|
static boolean |
isTikaInputStream(java.io.InputStream stream)
Checks whether the given stream is a TikaInputStream instance. |
void |
mark(int readlimit)
Invokes the delegate's mark(int) method. |
boolean |
markSupported()
Invokes the delegate's markSupported() method. |
int |
peek(byte[] buffer)
Fills the given buffer with upcoming bytes from this stream without advancing the current stream position. |
void |
reset()
Invokes the delegate's reset() method. |
void |
setOpenContainer(java.lang.Object container)
Stores the open container object against the stream, eg after a Zip contents detector has loaded the file to decide what it contains. |
long |
skip(long ln)
Invokes the delegate's skip(long) method. |
java.lang.String |
toString()
|
Methods inherited from class org.apache.tika.io.TaggedInputStream |
---|
handleIOException, isCauseOf, throwIfCauseOf |
Methods inherited from class org.apache.tika.io.ProxyInputStream |
---|
available, beforeRead, read, read, read |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Method Detail |
---|
public static boolean isTikaInputStream(java.io.InputStream stream)
null
, in which case the return
value is false
.
stream
- input stream, possibly null
true
if the stream is a TikaInputStream instance,
false
otherwisepublic static TikaInputStream get(java.io.InputStream stream, TemporaryResources tmp)
The given temporary file provider is used for any temporary files, and should be disposed when the returned stream is no longer used.
Use this method instead of the get(InputStream)
alternative
when you don't explicitly close the returned stream. The
recommended access pattern is:
TemporaryResources tmp = new TemporaryResources(); try { TikaInputStream stream = TikaInputStream.get(..., tmp); // process stream but don't close it } finally { tmp.close(); }
The given stream instance will not be closed when the
TemporaryResources.close()
method is called. The caller
is expected to explicitly close the original stream when it's no
longer used.
stream
- normal input stream
public static TikaInputStream get(java.io.InputStream stream, TemporaryFiles tmp)
get(InputStream, TemporaryResources)
instead
public static TikaInputStream get(java.io.InputStream stream)
Use this method instead of the
get(InputStream, TemporaryResources)
alternative when you
do explicitly close the returned stream. The recommended
access pattern is:
TikaInputStream stream = TikaInputStream.get(...); try { // process stream } finally { stream.close(); }
The given stream instance will be closed along with any other resources
associated with the returned TikaInputStream instance when the
close()
method is called.
stream
- normal input stream
public static TikaInputStream cast(java.io.InputStream stream)
null
if the stream is not a TikaInputStream.
stream
- normal input stream
public static TikaInputStream get(byte[] data)
Note that you must always explicitly close the returned stream as in some cases it may end up writing the given data to a temporary file.
data
- input data
public static TikaInputStream get(byte[] data, Metadata metadata)
Note that you must always explicitly close the returned stream as in some cases it may end up writing the given data to a temporary file.
data
- input datametadata
- metadata instance
java.io.IOException
public static TikaInputStream get(java.io.File file) throws java.io.FileNotFoundException
Note that you must always explicitly close the returned stream to prevent leaking open file handles.
file
- input file
java.io.FileNotFoundException
- if the file does not existpublic static TikaInputStream get(java.io.File file, Metadata metadata) throws java.io.FileNotFoundException
Note that you must always explicitly close the returned stream to prevent leaking open file handles.
file
- input filemetadata
- metadata instance
java.io.FileNotFoundException
- if the file does not existpublic static TikaInputStream get(java.sql.Blob blob) throws java.sql.SQLException
Note that the result set containing the BLOB may need to be kept open until the returned TikaInputStream has been processed and closed. You must also always explicitly close the returned stream as in some cases it may end up writing the blob data to a temporary file.
blob
- database BLOB
java.sql.SQLException
- if BLOB data can not be accessedpublic static TikaInputStream get(java.sql.Blob blob, Metadata metadata) throws java.sql.SQLException
Note that the result set containing the BLOB may need to be kept open until the returned TikaInputStream has been processed and closed. You must also always explicitly close the returned stream as in some cases it may end up writing the blob data to a temporary file.
blob
- database BLOBmetadata
- metadata instance
java.sql.SQLException
- if BLOB data can not be accessedpublic static TikaInputStream get(java.net.URI uri) throws java.io.IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
uri
- resource URI
java.io.IOException
- if the resource can not be accessedpublic static TikaInputStream get(java.net.URI uri, Metadata metadata) throws java.io.IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
uri
- resource URImetadata
- metadata instance
java.io.IOException
- if the resource can not be accessedpublic static TikaInputStream get(java.net.URL url) throws java.io.IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
url
- resource URL
java.io.IOException
- if the resource can not be accessedpublic static TikaInputStream get(java.net.URL url, Metadata metadata) throws java.io.IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
url
- resource URLmetadata
- metadata instance
java.io.IOException
- if the resource can not be accessedpublic int peek(byte[] buffer) throws java.io.IOException
buffer
- byte buffer
java.io.IOException
- if the stream can not be readpublic java.lang.Object getOpenContainer()
public void setOpenContainer(java.lang.Object container)
public boolean hasFile()
public java.io.File getFile() throws java.io.IOException
java.io.IOException
public java.nio.channels.FileChannel getFileChannel() throws java.io.IOException
java.io.IOException
public boolean hasLength()
public long getLength() throws java.io.IOException
getFile()
method to buffer the entire stream to
a temporary file in order to calculate the stream length. This case
will only work if the stream has not yet been consumed.
java.io.IOException
- if the length can not be determinedpublic long getPosition()
public long skip(long ln) throws java.io.IOException
ProxyInputStream
skip(long)
method.
skip
in class ProxyInputStream
ln
- the number of bytes to skip
java.io.IOException
- if an I/O error occurspublic void mark(int readlimit)
ProxyInputStream
mark(int)
method.
mark
in class ProxyInputStream
readlimit
- read ahead limitpublic boolean markSupported()
ProxyInputStream
markSupported()
method.
markSupported
in class ProxyInputStream
public void reset() throws java.io.IOException
ProxyInputStream
reset()
method.
reset
in class ProxyInputStream
java.io.IOException
- if an I/O error occurspublic void close() throws java.io.IOException
ProxyInputStream
close()
method.
close
in interface java.io.Closeable
close
in class ProxyInputStream
java.io.IOException
- if an I/O error occursprotected void afterRead(int n)
ProxyInputStream
Subclasses can override this method to add common post-processing functionality without having to override all the read methods. The default implementation does nothing.
Note this method is not called from ProxyInputStream.skip(long)
or
ProxyInputStream.reset()
. You need to explicitly override those methods if
you want to add post-processing steps also to them.
afterRead
in class ProxyInputStream
n
- number of bytes read, or -1 if the end of stream was reachedpublic java.lang.String toString()
toString
in class TaggedInputStream
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |