Apache UIMA (Unstructured Information Management Architecture) v2.1.0 Release Notes

Contents

1. What is UIMA?
2. Major Changes in this Release
3. Migrating from IBM UIMA to Apache UIMA
4. How to Get Involved
5. How to Report Issues
6. List of JIRA Issues Fixed in this Release

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

2. Major Changes in this Release

This section describes what has changed between version 2.0 and version 2.1 of UIMA. A migration utility is provided which will make the required updates to your Java code and descriptors. See Section 3, "Migrating from IBM UIMA to Apache UIMA" for instructions on how to run the migration utility.

2.1. Java Package Name Changes

All of the UIMA Java package names have changed in Apache UIMA. They now start with org.apache rather than com.ibm. There have been other changes as well. The package name segment reference_impl has been shortened to impl, and some segments have been reordered. For example com.ibm.uima.reference_impl.analysis_engine has become org.apache.uima.analysis_engine.impl. Tools are now consolidated under org.apache.uima.tools and service adapters under org.apache.uima.adapter.

The migration utility will replace all occurrences of IBM UIMA package names with their Apache UIMA equivalents. It will not replace prefixes of package names, so if your code uses a package called com.ibm.uima.myproject (although that is not recommended), it will not be replaced.

2.2. XML Descriptor Changes

The XML namespace in UIMA component descriptors has changed from http://uima.watson.ibm.com/resourceSpecifier to http://uima.apache.org/resourceSpecifier. The value of the <frameworkImplementation> must now be org.apache.uima.java or org.apache.uima.cpp. The migration script will apply these replacements.

2.3. TCAS replaced by CAS

In Apache UIMA the TCAS interface has been removed. All uses of it must now be replaced by the CAS interface. (All methods that used to be defined on TCAS were moved to CAS in v2.0.) The method CAS.getTCAS() is replaced with CAS.getCurrentView() and CAS.getTCAS(String) is replaced with CAS.getView(String) . The following have also been removed and replaced with the equivalent "CAS" variants: TCASException, TCASRuntimeException, TCasPool, and CasCreationUtils.createTCas(...).

The migration script will apply the necessary replacements.

2.4. JCas Is Now an Interface

In previous versions, user code accessed the JCas class directly. In Apache UIMA there is now an interface, org.apache.uima.jcas.JCas, which all JCas-based user code must now use. Static methods that were previously on the JCas class (and called from JCas cover classes generated by JCasGen) have been moved to the new org.apache.uima.jcas.JCasRegistry class. The migration script will apply the necessary replacements to your code, including any JCas cover classes that are part of your codebase.

2.5. JAR File names Have Changed

The UIMA JAR file names have changed slightly. Underscores have been replaced with hyphens to be consistent with Apache naming conventions. For example uima_core.jar is now uima-core.jar. Also uima_jcas_builtin_types.jar has been renamed to uima-document-annotation.jar. Finally, the jVinci.jar file is now in the lib directory rather than the lib/vinci directory as was previously the case. The migration script will apply the necessary replacements, for example to script files or Eclipse launch configurations.

2.6. Semantic Search Engine Repackaged

The versions of the UIMA SDK prior to the move into Apache came with a semantic search engine. The Apache version does not include this search engine. The search engine has been repackaged and is separately available from http://www.alphaworks.ibm.com/tech/uima. The intent is to hook up (over time) with other open source search engines, such as the Lucene search engine project in Apache.

3. Migrating from IBM UIMA to Apache UIMA

Note: Before running the migration utility, be sure to back up your files, just in case you encounter any problems, because the migration tool updates the files in place in the directories where it finds them.

The migration utility is run by executing the script file apache-uima/bin/ibmUimaToApacheUima.bat (Windows) or apache-uima/bin/ibmUimaToApacheUima.sh (UNIX). You must pass one argument: the directory containing the files that you want to be migrated. Subdirectories will be processed recursively.

The script scans your files and applies the necessary updates, for example replacing the com.ibm package names with the new org.apache package names.

The script will only attempt to modify files with the extensions: java, xml, xmi, wsdd, properties, launch, bat, cmd, sh, ksh, or csh; and files with no extension. Also, files with size greater than 1,000,000 bytes will be skipped. (If you want the script to modify files with other extensions, you can edit the script file and change the -ext argument appropriately.)

If the migration tool reports warnings, there may be a few additional steps to take. The following two sections explain some simple manual changes that you might need to make to your code.

3.1. JCas Cover Classes for DocumentAnnotation

If you have run JCasGen it is likely that you have the classes com.ibm.uima.jcas.tcas.DocumentAnnotation and com.ibm.uima.jcas.tcas.DocumentAnnotation_Type as part of your code. This package name is no longer valid, and the migration utility does not move your files between directories so it is unable to fix this.

If you have not made manual modifications to these classes, the best solution is usually to just delete these two classes (and their containing package). There is a default version in the uima-document-annotation.jar file that is included in Apache UIMA. If you have made custom changes, then you should not delete the file but instead move it to the correct package org.apache.uima.jcas.tcas. For more information about JCas and DocumentAnnotation please see Section 5.5.4, "Adding Features to DocumentAnnotation" in the UIMA References manual.

3.2. JCas.getDocumentAnnotation

The deprecated method JCas.getDocumentAnnotation has been removed. Its use must be replaced with JCas.getDocumentAnnotationFs. The method JCas.getDocumentAnnotationFs() returns type TOP, so your code must cast this to type DocumentAnnotation. The reasons for this are described in Section 5.5.4, "Adding Features to DocumentAnnotation" in the UIMA References manual.

3.3. Rare Cases Where Additional Manual Migration is Necessary

For most users there should not be any additional migration steps necessary. However, if the migration tool reported an additional warning or if you are having trouble getting your code to compile or run after running the migration, please see Section 1.4.2. "Rare Cases Where Additional Manual Migration is Necessary," in the Overview and Setup manual.

4. How to Get Involved

The Apache UIMA project really needs and appreciates any contributions, including documentation help, source code and feedback. If you are interested in contributing, please visit http://incubator.apache.org/uima/get-involved.html.

5. How to Report Issues

The Apache UIMA project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/uima

6. List of JIRA Issues Fixed in this Release

Bug

Improvement

New Feature

Task