Document and image analysis

Islandora

Islandora is an open-source software framework designed to help institutions and organizations and their audiences collaboratively manage, and discover digital assets using a best-practices framework.

Islandora connects the Drupal and Fedora open software applications, acting as a kind of glue between the content management and presentation capabilities of Drupal with the long term preservation features of Fedora.

In an interview with http://loomware.typepad.com/about.html, the University Librarian at the University of Prince Edward Island, Mark explains how Taverna workflows are utilized by the Islandora framework to call local Python scripts remotely via Web services&' wrappers. This allowed for extreme agility within the Islandora ecosystem and has allowed the integration of a wide range of open source (and proprietary where desirable) software systems in relatively quick order. Read the full interview with Mark Leggott.

DAE

The DAE project (Document Analysis Algorithm Contributions in End-to-End Applications) provides the DAE Platform to give access to a collection of resources and applications related to machine perception and document analysis. Applications include binarization, text segmentation, OCR (Optical Character Recognition), named entity detection, etc.

The DAE Platform is accessible though a Web portal which provides a series of services related to document analysis research of which the most prominent are the access to a wide range of reference datasets, as well as their annotations, ground-truths or interpretations; a catalogue of state of the art algorithms that can be executed on hosted or otherwise provided data, as well as the uploading and execution of complex workflows combining those algorithms.

One of the more advanced contributions of the DAE Platform is to provide an open and very flexible framework to add, run, evaluate and contribute algorithms. These algorithms are provided as Web services and can be invoked from anywhere. Because of its open architecture users can very easily contribute and convert their own algorithms to this framework, without necessarily disclosing their source code, and without the need to port their code to a particular technical environment.

The Platform is integrated with the Taverna Web Service Orchestration to provide the opportunity to design, host and execute complete and complex workflows of combined and distributed algorithms.