org.apache.any23.extractor.rdfa
Class RDFa11Parser

java.lang.Object
  extended by org.apache.any23.extractor.rdfa.RDFa11Parser

public class RDFa11Parser
extends Object

This parser is able to extract RDFa 1.0 and RDFa 1.1 statements from any (X)HTML document.

Author:
Michele Mostarda (mostarda@fbk.eu)

Field Summary
static String ABOUT_ATTRIBUTE
           
static String BODY_TAG
           
static String CONTENT_ATTRIBUTE
           
static String CURIE_SEPARATOR
           
static String DATATYPE_ATTRIBUTE
           
static String HEAD_TAG
           
static String HREF_ATTRIBUTE
           
static String PREFIX_ATTRIBUTE
           
static String PROFILE_ATTRIBUTE
           
static String PROPERTY_ATTRIBUTE
           
static String REL_ATTRIBUTE
           
static String RESOURCE_ATTRIBUTE
           
static String REV_ATTRIBUTE
           
static String SRC_ATTRIBUTE
           
static String[] SUBJECT_ATTRIBUTES
           
static String TYPEOF_ATTRIBUTE
           
static String URI_PATH_SEPARATOR
           
static char URI_PREFIX_SEPARATOR
           
static String URI_SCHEMA_SEPARATOR
           
static String VOCAB_ATTRIBUTE
           
static String XML_LANG_ATTRIBUTE
           
static String XML_LITERAL_DATATYPE
           
static String XMLNS_ATTRIBUTE
           
static String XMLNS_DEFAULT
           
 
Constructor Summary
RDFa11Parser()
           
 
Method Summary
protected static String[] extractPrefixSections(String prefixesDeclaration)
          Given a prefix declaration returns a list of prefixID:prefixURL strings normalizing blanks where present.
protected static org.openrdf.model.Literal getAsPlainLiteral(Node node, String currentLanguage)
           
protected static org.openrdf.model.Literal getAsXMLLiteral(Node node)
           
protected static URL getDocumentBase(URL documentURL, Document document)
           
protected  org.openrdf.model.URI getMapping(String prefix)
          Returns a URI mapping for a given prefix.
protected static boolean isAbsoluteURI(String uri)
           
protected static boolean isCURIE(String curie)
           
protected static boolean isCURIEBNode(String curie)
           
protected static boolean isRelativeNode(Node node)
           
protected static boolean isXMLNSDeclared(Document document)
           
 void processDocument(URL documentURL, Document document, ExtractionResult extractionResult)
          RDFa Syntax - Processing Model.
 void reset()
          Resets the parser to the original state.
protected  org.openrdf.model.Resource resolveCURIEOrURI(String curieOrURI, boolean termAllowed)
          Resolves a CURIE or URI string.
protected  org.openrdf.model.URI[] resolveCurieOrURIList(Node n, String curieOrURIList, boolean termAllowed)
          Resolves a whitelist separated list of CURIE or URI.
protected  org.openrdf.model.URI resolveURI(String uriStr)
          Resolves a URI string as URI.
protected  void updateURIMapping(Node node)
          Updates the URI mapping with the XMLNS attributes declared in the current node.
protected  void updateVocabulary(Node currentNode)
          Updates the vocabulary context with possible @vocab declarations.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CURIE_SEPARATOR

public static final String CURIE_SEPARATOR
See Also:
Constant Field Values

URI_PREFIX_SEPARATOR

public static final char URI_PREFIX_SEPARATOR
See Also:
Constant Field Values

URI_SCHEMA_SEPARATOR

public static final String URI_SCHEMA_SEPARATOR
See Also:
Constant Field Values

URI_PATH_SEPARATOR

public static final String URI_PATH_SEPARATOR
See Also:
Constant Field Values

HEAD_TAG

public static final String HEAD_TAG
See Also:
Constant Field Values

BODY_TAG

public static final String BODY_TAG
See Also:
Constant Field Values

XMLNS_ATTRIBUTE

public static final String XMLNS_ATTRIBUTE
See Also:
Constant Field Values

XML_LANG_ATTRIBUTE

public static final String XML_LANG_ATTRIBUTE
See Also:
Constant Field Values

REL_ATTRIBUTE

public static final String REL_ATTRIBUTE
See Also:
Constant Field Values

REV_ATTRIBUTE

public static final String REV_ATTRIBUTE
See Also:
Constant Field Values

ABOUT_ATTRIBUTE

public static final String ABOUT_ATTRIBUTE
See Also:
Constant Field Values

RESOURCE_ATTRIBUTE

public static final String RESOURCE_ATTRIBUTE
See Also:
Constant Field Values

SRC_ATTRIBUTE

public static final String SRC_ATTRIBUTE
See Also:
Constant Field Values

HREF_ATTRIBUTE

public static final String HREF_ATTRIBUTE
See Also:
Constant Field Values

SUBJECT_ATTRIBUTES

public static final String[] SUBJECT_ATTRIBUTES

PREFIX_ATTRIBUTE

public static final String PREFIX_ATTRIBUTE
See Also:
Constant Field Values

TYPEOF_ATTRIBUTE

public static final String TYPEOF_ATTRIBUTE
See Also:
Constant Field Values

PROPERTY_ATTRIBUTE

public static final String PROPERTY_ATTRIBUTE
See Also:
Constant Field Values

DATATYPE_ATTRIBUTE

public static final String DATATYPE_ATTRIBUTE
See Also:
Constant Field Values

CONTENT_ATTRIBUTE

public static final String CONTENT_ATTRIBUTE
See Also:
Constant Field Values

VOCAB_ATTRIBUTE

public static final String VOCAB_ATTRIBUTE
See Also:
Constant Field Values

PROFILE_ATTRIBUTE

public static final String PROFILE_ATTRIBUTE
See Also:
Constant Field Values

XML_LITERAL_DATATYPE

public static final String XML_LITERAL_DATATYPE
See Also:
Constant Field Values

XMLNS_DEFAULT

public static final String XMLNS_DEFAULT
See Also:
Constant Field Values
Constructor Detail

RDFa11Parser

public RDFa11Parser()
Method Detail

getDocumentBase

protected static URL getDocumentBase(URL documentURL,
                                     Document document)
                              throws MalformedURLException
Throws:
MalformedURLException

extractPrefixSections

protected static String[] extractPrefixSections(String prefixesDeclaration)
Given a prefix declaration returns a list of prefixID:prefixURL strings normalizing blanks where present.

Parameters:
prefixesDeclaration -
Returns:

isAbsoluteURI

protected static boolean isAbsoluteURI(String uri)

isCURIE

protected static boolean isCURIE(String curie)

isCURIEBNode

protected static boolean isCURIEBNode(String curie)

isRelativeNode

protected static boolean isRelativeNode(Node node)

getAsPlainLiteral

protected static org.openrdf.model.Literal getAsPlainLiteral(Node node,
                                                             String currentLanguage)

getAsXMLLiteral

protected static org.openrdf.model.Literal getAsXMLLiteral(Node node)
                                                    throws IOException,
                                                           TransformerException
Throws:
IOException
TransformerException

isXMLNSDeclared

protected static boolean isXMLNSDeclared(Document document)

processDocument

public void processDocument(URL documentURL,
                            Document document,
                            ExtractionResult extractionResult)
                     throws RDFa11ParserException
RDFa Syntax - Processing Model.

Parameters:
documentURL -
extractionResult -
document -
Throws:
RDFa11ParserException

reset

public void reset()
Resets the parser to the original state.


updateVocabulary

protected void updateVocabulary(Node currentNode)
Updates the vocabulary context with possible @vocab declarations.

Parameters:
currentNode - the current node.

updateURIMapping

protected void updateURIMapping(Node node)
Updates the URI mapping with the XMLNS attributes declared in the current node.

Parameters:
node - input node.

getMapping

protected org.openrdf.model.URI getMapping(String prefix)
Returns a URI mapping for a given prefix.

Parameters:
prefix - input prefix.
Returns:
URI mapping.

resolveCurieOrURIList

protected org.openrdf.model.URI[] resolveCurieOrURIList(Node n,
                                                        String curieOrURIList,
                                                        boolean termAllowed)
                                                 throws URISyntaxException
Resolves a whitelist separated list of CURIE or URI.

Parameters:
n - current node.
curieOrURIList - list of CURIE/URI.
Returns:
Throws:
URISyntaxException

resolveURI

protected org.openrdf.model.URI resolveURI(String uriStr)
Resolves a URI string as URI.

Parameters:
uriStr -
Returns:

resolveCURIEOrURI

protected org.openrdf.model.Resource resolveCURIEOrURI(String curieOrURI,
                                                       boolean termAllowed)
Resolves a CURIE or URI string.

Parameters:
curieOrURI -
termAllowed - if true the resolution can be a term.
Returns:
the resolved resource.


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.