org.apache.any23.extractor
Interface Extractor<Input>

Type Parameters:
Input - the type of the input data to be processed.
All Known Subinterfaces:
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
All Known Implementing Classes:
AdrExtractor, BaseRDFExtractor, CSVExtractor, EntityBasedMicroformatExtractor, GeoExtractor, HCalendarExtractor, HCardExtractor, HeadLinkExtractor, HListingExtractor, HRecipeExtractor, HResumeExtractor, HReviewExtractor, HTMLMetaExtractor, ICBMExtractor, LicenseExtractor, MicrodataExtractor, MicroformatExtractor, NQuadsExtractor, NTriplesExtractor, RDFa11Extractor, RDFaExtractor, RDFXMLExtractor, SpeciesExtractor, TitleExtractor, TriXExtractor, TurtleExtractor, TurtleHTMLExtractor, XFNExtractor, XPathExtractor

public interface Extractor<Input>

It defines the signature of a generic Extractor.


Nested Class Summary
static interface Extractor.BlindExtractor
          This interface specializes an Extractor able to handle URI as input format.
static interface Extractor.ContentExtractor
          This interface specializes an Extractor able to handle InputStream as input format.
static interface Extractor.TagSoupDOMExtractor
          This interface specializes an Extractor able to handle Document as input format.
 
Method Summary
 ExtractorDescription getDescription()
          Returns a ExtractorDescription of this extractor.
 void run(ExtractionParameters extractionParameters, ExtractionContext context, Input in, ExtractionResult out)
          Executes the extractor.
 

Method Detail

run

void run(ExtractionParameters extractionParameters,
         ExtractionContext context,
         Input in,
         ExtractionResult out)
         throws IOException,
                ExtractionException
Executes the extractor. Will be invoked only once, extractors are not reusable.

Parameters:
extractionParameters - the parameters to be applied during the extraction.
context - The document context.
in - The extractor input data.
out - the collector for the extracted data.
Throws:
IOException - On error while reading from the input stream.
ExtractionException - On other error, such as parse errors.

getDescription

ExtractorDescription getDescription()
Returns a ExtractorDescription of this extractor.

Returns:
the object representing the extractor description.


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.