Constructor and Description |
---|
Tika()
Creates a Tika facade using the default configuration.
|
Tika(Detector detector)
Creates a Tika facade using the given detector instance and the
default parser configuration.
|
Tika(Detector detector,
Parser parser)
Creates a Tika facade using the given detector and parser instances.
|
Tika(TikaConfig config)
Creates a Tika facade using the given configuration.
|
Modifier and Type | Method and Description |
---|---|
String |
detect(byte[] prefix)
Detects the media type of the given document.
|
String |
detect(byte[] prefix,
String name)
Detects the media type of the given document.
|
String |
detect(File file)
Detects the media type of the given file.
|
String |
detect(InputStream stream)
Detects the media type of the given document.
|
String |
detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
String |
detect(InputStream stream,
String name)
Detects the media type of the given document.
|
String |
detect(String name)
Detects the media type of a document with the given file name.
|
String |
detect(URL url)
Detects the media type of the resource at the given URL.
|
Detector |
getDetector()
Returns the detector instance used by this facade.
|
int |
getMaxStringLength()
Returns the maximum length of strings returned by the
parseToString methods.
|
Parser |
getParser()
Returns the parser instance used by this facade.
|
Reader |
parse(File file)
Parses the given file and returns the extracted text content.
|
Reader |
parse(InputStream stream)
Parses the given document and returns the extracted text content.
|
Reader |
parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
parse(URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
String |
parseToString(File file)
Parses the given file and returns the extracted text content.
|
String |
parseToString(InputStream stream)
Parses the given document and returns the extracted text content.
|
String |
parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
String |
parseToString(URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
void |
setMaxStringLength(int maxStringLength)
Sets the maximum length of strings returned by the parseToString
methods.
|
String |
toString() |
public Tika(Detector detector, Parser parser)
detector
- type detectorparser
- document parserpublic Tika(TikaConfig config)
config
- Tika configurationpublic Tika()
public Tika(Detector detector)
detector
- type detectorpublic String detect(InputStream stream, Metadata metadata) throws IOException
null
,
in which case only the given document metadata is used for type
detection.
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
Unlike in the parse(InputStream, Metadata)
method, the
given document metadata is not modified by this method.
stream
- the document stream, or null
metadata
- document metadataIOException
- if the stream can not be readpublic String detect(InputStream stream, String name) throws IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamname
- document nameIOException
- if the stream can not be readpublic String detect(InputStream stream) throws IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamIOException
- if the stream can not be readpublic String detect(byte[] prefix, String name)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentname
- document namepublic String detect(byte[] prefix)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentpublic String detect(File file) throws IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
file
- the fileIOException
- if the file can not be readpublic String detect(URL url) throws IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the URL.
url
- the URL of the resourceIOException
- if the resource can not be readpublic String detect(String name)
The given name can also be a URL or a full file path. In such cases only the file name part of the string is used for type detection.
name
- the file name of the documentpublic Reader parse(InputStream stream, Metadata metadata) throws IOException
The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close()
method is called.
stream
- the document to be parsedmetadata
- document metadataIOException
- if the document can not be read or parsedpublic Reader parse(InputStream stream) throws IOException
The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close()
method is called.
stream
- the document to be parsedIOException
- if the document can not be read or parsedpublic Reader parse(File file) throws IOException
file
- the file to be parsedIOException
- if the file can not be read or parsedpublic Reader parse(URL url) throws IOException
url
- the URL of the resource to be parsedIOException
- if the resource can not be read or parsedpublic String parseToString(InputStream stream, Metadata metadata) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedmetadata
- document metadataIOException
- if the document can not be readTikaException
- if the document can not be parsedpublic String parseToString(InputStream stream, Metadata metadata, int maxLength) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains only up to maxLength (parameter) first characters extracted from the input document.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedmetadata
- document metadatamaxLength
- maximum length of the returned stringIOException
- if the document can not be readTikaException
- if the document can not be parsedpublic String parseToString(InputStream stream) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedIOException
- if the document can not be readTikaException
- if the document can not be parsedpublic String parseToString(File file) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
file
- the file to be parsedIOException
- if the file can not be readTikaException
- if the file can not be parsedpublic String parseToString(URL url) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
url
- the URL of the resource to be parsedIOException
- if the resource can not be readTikaException
- if the resource can not be parsedpublic int getMaxStringLength()
public void setMaxStringLength(int maxStringLength)
maxStringLength
- maximum string length,
or -1 to disable this limitpublic Parser getParser()
public Detector getDetector()
Copyright © 2007-2014 The Apache Software Foundation. All Rights Reserved.