org.apache.any23.extractor.html
Class MicroformatExtractor

java.lang.Object
  extended by org.apache.any23.extractor.html.MicroformatExtractor
All Implemented Interfaces:
Extractor<Document>, Extractor.TagSoupDOMExtractor
Direct Known Subclasses:
EntityBasedMicroformatExtractor, HCalendarExtractor

public abstract class MicroformatExtractor
extends Object
implements Extractor.TagSoupDOMExtractor

The abstract base class for any Microformat specification extractor.


Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
 
Field Summary
static String BEGIN_SCRIPT
           
static String END_SCRIPT
           
protected  Any23ValueFactoryWrapper valueFactory
           
 
Constructor Summary
MicroformatExtractor()
           
 
Method Summary
protected  void addBNodeProperty(Node n, org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.BNode bnode)
          Helper method that adds a BNode property to a node.
protected  void addBNodeProperty(org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.BNode bnode)
          Helper method that adds a BNode property to a node.
protected  void addURIProperty(org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.URI object)
          Helper method that adds a URI property to a node.
protected  boolean conditionallyAddLiteralProperty(Node n, org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.Literal literal)
          Helper method that adds a literal property to a node.
protected  boolean conditionallyAddResourceProperty(org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.URI uri)
          Helper method that adds a URI property to a node.
protected  boolean conditionallyAddStringProperty(Node n, org.openrdf.model.Resource subject, org.openrdf.model.URI p, String value)
          Helper method that adds a literal property to a subject only if the value of the property is a valid string.
protected abstract  boolean extract()
          Performs the extraction of the data and writes them to the model.
protected  org.openrdf.model.URI fixLink(String link)
           
protected  org.openrdf.model.URI fixLink(String link, String defaultSchema)
           
protected  ExtractionResult getCurrentExtractionResult()
          Returns the ExtractionResult associated to the extraction session.
abstract  ExtractorDescription getDescription()
          Returns the description of this extractor.
 org.openrdf.model.URI getDocumentURI()
           
 ExtractionContext getExtractionContext()
           
 HTMLDocument getHTMLDocument()
           
static boolean includes(Class<? extends MicroformatExtractor> including, Class<? extends MicroformatExtractor> included)
          This method checks if there is a native nesting relationship between two MicroformatExtractor.
protected  ExtractionResult openSubResult(ExtractionContext context)
           
 void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out)
          Executes the extractor.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BEGIN_SCRIPT

public static final String BEGIN_SCRIPT
See Also:
Constant Field Values

END_SCRIPT

public static final String END_SCRIPT
See Also:
Constant Field Values

valueFactory

protected final Any23ValueFactoryWrapper valueFactory
Constructor Detail

MicroformatExtractor

public MicroformatExtractor()
Method Detail

getDescription

public abstract ExtractorDescription getDescription()
Returns the description of this extractor.

Specified by:
getDescription in interface Extractor<Document>
Returns:
a human readable description.

extract

protected abstract boolean extract()
                            throws ExtractionException
Performs the extraction of the data and writes them to the model. The nodes generated in the model can have any name or implicit label but if possible they SHOULD have names (either URIs or AnonId) that are uniquely derivable from their position in the DOM tree, so that multiple extractors can merge information.

Throws:
ExtractionException

getHTMLDocument

public HTMLDocument getHTMLDocument()

getExtractionContext

public ExtractionContext getExtractionContext()

getDocumentURI

public org.openrdf.model.URI getDocumentURI()

run

public final void run(ExtractionParameters extractionParameters,
                      ExtractionContext extractionContext,
                      Document in,
                      ExtractionResult out)
               throws IOException,
                      ExtractionException
Description copied from interface: Extractor
Executes the extractor. Will be invoked only once, extractors are not reusable.

Specified by:
run in interface Extractor<Document>
Parameters:
extractionParameters - the parameters to be applied during the extraction.
extractionContext - The document context.
in - The extractor input data.
out - the collector for the extracted data.
Throws:
IOException - On error while reading from the input stream.
ExtractionException - On other error, such as parse errors.

getCurrentExtractionResult

protected ExtractionResult getCurrentExtractionResult()
Returns the ExtractionResult associated to the extraction session.

Returns:
a valid extraction result.

openSubResult

protected ExtractionResult openSubResult(ExtractionContext context)

conditionallyAddStringProperty

protected boolean conditionallyAddStringProperty(Node n,
                                                 org.openrdf.model.Resource subject,
                                                 org.openrdf.model.URI p,
                                                 String value)
Helper method that adds a literal property to a subject only if the value of the property is a valid string.

Parameters:
n - the HTML node from which the property value has been extracted.
subject - the property subject.
p - the property URI.
value - the property value.
Returns:
returns true if the value has been accepted and added, false otherwise.

conditionallyAddLiteralProperty

protected boolean conditionallyAddLiteralProperty(Node n,
                                                  org.openrdf.model.Resource subject,
                                                  org.openrdf.model.URI property,
                                                  org.openrdf.model.Literal literal)
Helper method that adds a literal property to a node.

Parameters:
n - the HTML node from which the property value has been extracted.
subject - subject the property subject.
property - the property URI.
literal - value the property value.
Returns:
returns true if the literal has been accepted and added, false otherwise.

conditionallyAddResourceProperty

protected boolean conditionallyAddResourceProperty(org.openrdf.model.Resource subject,
                                                   org.openrdf.model.URI property,
                                                   org.openrdf.model.URI uri)
Helper method that adds a URI property to a node.

Parameters:
subject - the property subject.
property - the property URI.
uri - the property object.
Returns:
true if the the resource has been added, false otherwise.

addBNodeProperty

protected void addBNodeProperty(Node n,
                                org.openrdf.model.Resource subject,
                                org.openrdf.model.URI property,
                                org.openrdf.model.BNode bnode)
Helper method that adds a BNode property to a node.

Parameters:
n - the HTML node used for extracting such property.
subject - the property subject.
property - the property URI.
bnode - the property value.

addBNodeProperty

protected void addBNodeProperty(org.openrdf.model.Resource subject,
                                org.openrdf.model.URI property,
                                org.openrdf.model.BNode bnode)
Helper method that adds a BNode property to a node.

Parameters:
subject - the property subject.
property - the property URI.
bnode - the property value.

addURIProperty

protected void addURIProperty(org.openrdf.model.Resource subject,
                              org.openrdf.model.URI property,
                              org.openrdf.model.URI object)
Helper method that adds a URI property to a node.

Parameters:
subject -
property -
object -

fixLink

protected org.openrdf.model.URI fixLink(String link)

fixLink

protected org.openrdf.model.URI fixLink(String link,
                                        String defaultSchema)

includes

public static boolean includes(Class<? extends MicroformatExtractor> including,
                               Class<? extends MicroformatExtractor> included)
This method checks if there is a native nesting relationship between two MicroformatExtractor.

Parameters:
including - the including MicroformatExtractor
included - the included MicroformatExtractor
Returns:
true if there is a declared nesting relationship
See Also:
org.apache.any23.extractor.html.annotations.Includes}


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.