Apache Any23 Plugins

Introduction

This section describes the Apache Any23 plugins support.

Apache Any23 comes with a set of predefined plugins. Such plugins are located under the $ANY23_HOME/plugins dir.

A plugin is a standard Maven3 module containing any implementation of

How to Register a Plugin

A plugin can be added to the Apache Any23 CLI interface by:

  • adding its JAR to the Apache Any23 JVM classpath;
  • adding its JAR to the CLASSPATH_PREFIX environment variable as:
    export CLASSPATH_PREFIX=../../../plugins/basic-crawler/target/any23-basic-crawler-VERSION.jar
  • adding its JAR to the $HOME/.any23/plugins directory.

    A plugin can be added to the Apache Any23 library API by first creating a static instance of Any23PluginManager#getInstance(). Once this is done there is a variety of options to configure and register a plugins, etc. An example of dynamic plugin loading can be seen via the way that the OpenIE toggling is implemented within the Any23 Webservice e.g.

    if (openie) {
        Any23PluginManager pManager = Any23PluginManager.getInstance();
        //Dynamically adding Jar's to the Classpath via the following logic
        //is absolutely dependant on the 'apache-any23-openie' directory being
        //present within the webapp /lib directory. This is specified within 
        //the maven-dependency-plugin.
        File webappClasspath = new File(getClass().getClassLoader().getResource("").getPath());
        File openIEJarPath = new File(webappClasspath.getParentFile().getPath() + "/lib/apache-any23-openie");
        boolean loadedJars = pManager.loadJARDir(openIEJarPath);
        if (loadedJars) {
            ExtractorRegistry r = ExtractorRegistryImpl.getInstance();
            try {
                pManager.getExtractors().forEachRemaining(r::register);
            } catch (IOException e) {
                LOG.error("Error during dynamic classloading of JARs from OpenIE runtime directory {}", openIEJarPath.toString(), e);
            }
            LOG.info("Successful dynamic classloading of JARs from OpenIE runtime directory {}", openIEJarPath.toString());
        }
    }

    Any implementation of ExtractorPlugin will automatically registered to the ExtractorRegistry.

    Any detected implementation of Tool will be listed by the ToolRunner command-line tool in any23-root/cli/bin/any23 .

How to Build a Plugin

Apache Any23 takes care to test and package plugins when distributed from its reactor POM. It is aways possible to rebuild a plugin using the command:

<plugin-dir>$ mvn clean assembly:assembly

How to Write an Extractor Plugin

An Extractor Plugin is a class:

  • implementing one of the Extractor subinterfaces;
  • packaged under org.apache.any23.plugin .

    An example of plugin is defined below.

    @Author(name="Michele Mostarda (mostarda@fbk.eu)")
    public class HTMLScraperExtractor implements Extractor.ContentExtractor {
    
        private static final Logger logger = LoggerFactory.getLogger(HTMLScraperPlugin.class);
    
        @Override
        public void run(
                ExtractionParameters extractionParameters,
                ExtractionContext extractionContext,
                InputStream inputStream,
                ExtractionResult extractionResult
        ) throws IOException, ExtractionException {
            ...
        }
    
        @Override
        public ExtractorDescription getDescription() {
            return HTMLScraperExtractorFactory.getDescriptionInstance();
        }
    
        @Override
        public void setStopAtFirstError(boolean b) {
            // Ignored.
        }
    
    }

How to Write a Tool Plugin

A Tool Plugin is a Java class that:

  • implementing the Tool interface;
  • CLI parameters are extracted by annotating the class members with JCommander annotations.
  • have to be found using the ServiceLoader (we usually plug the Kohsuke's generator)

    An example of plugin is defined below.

    @Parameters(commandNames = { "myexec" }, commandDescription = "Prints out XXX used by Any23.")
    public class MyExecutableTool implements Tool {
    
        @Parameter(names = { "-u", "--urls" }, description = "URLs to process")
        private List<URL> pairs;
    
        public void run() throws Exception;
            
        }
    
    }

So when executing any23>>, the <<<myexec will be available in the commands list.

Available Extractor Plugins

  • HTML Scraper Plugin

    The HTMLScraperPlugin is able to scrape plain text content from any HTML page and transform it into statement literals.

    This plugin is documented here.

  • Office Scraper Plugins

    The Office Scraper Plugins allow to extract semantic content from several Microsoft Office document formats.

    These plugins are documented here.

  • OpenIE Extractor Plugin

    As of 2.1 Any23 provides functionality to extract triples using the Open Information Extraction (Open IE) system. The Open IE system runs over input sentences and creates extractions that represent relations in text, in the case of Any23, this results in triples. Se the above example on how to register a plugin to see how the OpenIE Extractor plugin is currently used within the Any23 Service.

Available CLI Tool Plugins