|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.Tika
public class Tika
Facade class for accessing Tika functionality. This class hides much of the underlying complexity of the lower level Tika classes and provides simple methods for many common parsing and type detection operations.
Parser
,
Detector
Constructor Summary | |
---|---|
Tika()
Creates a Tika facade using the default configuration. |
|
Tika(TikaConfig config)
Creates a Tika facade using the given configuration. |
Method Summary | |
---|---|
String |
detect(File file)
Detects the media type of the given file. |
String |
detect(InputStream stream)
Detects the media type of the given document. |
String |
detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document. |
String |
detect(String name)
Detects the media type of a document with the given file name. |
String |
detect(URL url)
Detects the media type of the resource at the given URL. |
Reader |
parse(File file)
Parses the given file and returns the extracted text content. |
Reader |
parse(InputStream stream)
Parses the given document and returns the extracted text content. |
Reader |
parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content. |
Reader |
parse(URL url)
Parses the resource at the given URL and returns the extracted text content. |
String |
parseToString(File file)
Parses the given file and returns the extracted text content. |
String |
parseToString(InputStream stream)
Parses the given document and returns the extracted text content. |
String |
parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content. |
String |
parseToString(URL url)
Parses the resource at the given URL and returns the extracted text content. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Tika(TikaConfig config)
config
- Tika configurationpublic Tika()
Method Detail |
---|
public String detect(InputStream stream, Metadata metadata) throws IOException
null
,
in which case only the given document metadata is used for type
detection.
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
Unlike in the parse(InputStream, Metadata)
method, the
given document metadata is not modified by this method.
stream
- the document stream, or null
metadata
- document metadata
IOException
- if the stream can not be readpublic String detect(InputStream stream) throws IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document stream
IOException
- if the stream can not be readpublic String detect(File file) throws FileNotFoundException, IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
file
- the file
FileNotFoundException
- if the file does not exist
IOException
- if the file can not be readpublic String detect(URL url) throws IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the URL.
url
- the URL of the resource
IOException
- if the resource can not be readpublic String detect(String name)
The given name can also be a URL or a full file path. In such cases only the file name part of the string is used for type detection.
name
- the file name of the document
public Reader parse(InputStream stream, Metadata metadata) throws IOException
stream
- the document to be parsed
IOException
- if the document can not be read or parsedpublic Reader parse(InputStream stream) throws IOException
stream
- the document to be parsed
IOException
- if the document can not be read or parsedpublic Reader parse(File file) throws FileNotFoundException, IOException
file
- the file to be parsed
FileNotFoundException
- if the given file does not exist
IOException
- if the file can not be read or parsedpublic Reader parse(URL url) throws IOException
url
- the URL of the resource to be parsed
IOException
- if the resource can not be read or parsedpublic String parseToString(InputStream stream, Metadata metadata) throws IOException, TikaException
stream
- the document to be parsedmetadata
- document metadata
IOException
- if the document can not be read
TikaException
- if the document can not be parsedpublic String parseToString(InputStream stream) throws IOException, TikaException
stream
- the document to be parsed
IOException
- if the document can not be read
TikaException
- if the document can not be parsedpublic String parseToString(File file) throws FileNotFoundException, IOException, TikaException
file
- the file to be parsed
FileNotFoundException
- if the file does not exist
IOException
- if the file can not be read
TikaException
- if the file can not be parsedpublic String parseToString(URL url) throws IOException, TikaException
url
- the URL of the resource to be parsed
IOException
- if the resource can not be read
TikaException
- if the resource can not be parsed
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |