------ Apache Any23 - Microformat Extractors ------ The Apache Software Foundation ------ 2011-2012 ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Microformat Extractors This section describes some extractions corner-cases and their relative RDF representations. Main aim of this section is to describe how some specific cases are processed with <> showing the correspondences between the extracted RDF triples. {microformat-nesting} * Nesting different Microformats [TODO: add picture about microformat nesting structure.] This section describes how <> represents, with RDF, the content of an HTML fragments containing different nested Microformats. <> performs the extraction executing different extractors for every supported Microformat on a input HTML page. There are two different possibilities to write extractors able to produce a set of RDF triples that coherently represents this nesting. More specifically: * Embedding explicitly the logic within the {{{./xref/org/apache/any23/extractor/html/package-summary.html}Microformats Extractors}} * Using the default <> nesting feature. In the first case, the logic for representing the nested values, is directly embedded in the upper-level Extractor. For example, the following HTML fragment shows an hCard that contains an hAddress Microformat. +---------------------------------------------------------------------------------------------- L'Amourita Pizza Located at 123 Main St, Albequerque, NM. http://pizza.example.com +---------------------------------------------------------------------------------------------- Since, as shown below, the {{{./xref/org/apache/any23/extractor/html/HCardExtractor.html}HCardExtractor}} contains the code to handle nested hAddress, +------------------------------ foundSomething |= addSubMicroformat("adr", card, VCARD.adr); ... private boolean addSubMicroformat(String className, Resource resource, URI property) { List nodes = fragment.findAllByClassName(className); if (nodes.isEmpty()) return false; for (Node node : nodes) { addBNodeProperty( getDescription().getExtractorName(), node, resource, property, getBlankNodeFor(node) ); } return true; } +------------------------------ it explicitly produces the triples claiming the native nesting relationship: +---------------------------------------------------------------------------------------------------- 123 Main St Albequerque NM +----------------------------------------------------------------------------------------------------- It is higly recommended to decorate the extractors who natively handle the nesting relatioship using the {{{./xref/org/apache/any23/extractor/html/annotations/Includes.html}@Includes}} annotation. This annotation, if present, avoid the production of and RDF statements. The following example shows how the {{{./xref/org/apache/any23/extractor/html/annotations/Includes.html}@Includes}} annotation could be used to claim the fact that {{{./xref/org/apache/any23/extractor/html/HCardExtractor.html}HCardExtractor}} natively embedds the {{{./xref/org/apache/any23/extractor/html/AdrExtractor.html}AdrExtractor}}. +---------------------------------------------------------------------------------------------- @Includes( extractors = AdrExtractor.class ) public class HCardExtractor extends EntityBasedMicroformatExtractor { // code omitted for brevity } +---------------------------------------------------------------------------------------------- Instead, the second manner is to leave to <> the responsibility of identifying nested Microformats and produce a set of descriptive RDF triples. More specifically, the following HTML fragment, provided as a reference example on the {{{http://www.google.com/support/webmasters/bin/answer.py?answer=146862}Google Webmaster tools blog}}, shows a vEvent Microformat with a nested vCard. +----------------------------------------------------------------------------------------------

This event is organized by Tantek Celik Technorati Tantek Celik

+---------------------------------------------------------------------------------------------- Due to the fact that the <> provided extractors don't explicitly foresee the possibility of nesting such two Microformats, it automatically identifies the nesting relationship and represents it with the following triples: +--------------------------------------------------------- +--------------------------------------------------------- That informally means that the vEvent Microformat has a nested hCard through the property http://www.w3.org/2002/12/cal/icaltzd#summary providing for them two blank nodes.