Apache UIMA (Unstructured Information Management Architecture) v2.1.0 Release Notes

1. What is UIMA?
2. Major Changes in this Release
3. Migrating from IBM UIMA to Apache UIMA
4. How to Get Involved
5. How to Report Issues
6. List of JIRA Issues Fixed in this Release

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

2. Major Changes in this Release

This section describes what has changed between version 2.0 and version 2.1 of UIMA. A migration utility is provided which will make the required updates to your Java code and descriptors. See Section 3, "Migrating from IBM UIMA to Apache UIMA" for instructions on how to run the migration utility.

2.1. Java Package Name Changes

All of the UIMA Java package names have changed in Apache UIMA. They now start with org.apache rather than com.ibm. There have been other changes as well. The package name segment reference_impl has been shortened to impl, and some segments have been reordered. For example com.ibm.uima.reference_impl.analysis_engine has become org.apache.uima.analysis_engine.impl. Tools are now consolidated under org.apache.uima.tools and service adapters under org.apache.uima.adapter.

The migration utility will replace all occurrences of IBM UIMA package names with their Apache UIMA equivalents. It will not replace prefixes of package names, so if your code uses a package called com.ibm.uima.myproject (although that is not recommended), it will not be replaced.

2.2. XML Descriptor Changes

The XML namespace in UIMA component descriptors has changed from http://uima.watson.ibm.com/resourceSpecifier to http://uima.apache.org/resourceSpecifier. The value of the <frameworkImplementation> must now be org.apache.uima.java or org.apache.uima.cpp. The migration script will apply these replacements.

2.3. TCAS replaced by CAS

In Apache UIMA the TCAS interface has been removed. All uses of it must now be replaced by the CAS interface. (All methods that used to be defined on TCAS were moved to CAS in v2.0.) The method CAS.getTCAS() is replaced with CAS.getCurrentView() and CAS.getTCAS(String) is replaced with CAS.getView(String) . The following have also been removed and replaced with the equivalent "CAS" variants: TCASException, TCASRuntimeException, TCasPool, and CasCreationUtils.createTCas(...).

The migration script will apply the necessary replacements.

2.4. JCas Is Now an Interface

In previous versions, user code accessed the JCas class directly. In Apache UIMA there is now an interface, org.apache.uima.jcas.JCas, which all JCas-based user code must now use. Static methods that were previously on the JCas class (and called from JCas cover classes generated by JCasGen) have been moved to the new org.apache.uima.jcas.JCasRegistry class. The migration script will apply the necessary replacements to your code, including any JCas cover classes that are part of your codebase.

2.5. JAR File names Have Changed

The UIMA JAR file names have changed slightly. Underscores have been replaced with hyphens to be consistent with Apache naming conventions. For example uima_core.jar is now uima-core.jar. Also uima_jcas_builtin_types.jar has been renamed to uima-document-annotation.jar. Finally, the jVinci.jar file is now in the lib directory rather than the lib/vinci directory as was previously the case. The migration script will apply the necessary replacements, for example to script files or Eclipse launch configurations.

2.6. Semantic Search Engine Repackaged

The versions of the UIMA SDK prior to the move into Apache came with a semantic search engine. The Apache version does not include this search engine. The search engine has been repackaged and is separately available from http://www.alphaworks.ibm.com/tech/uima. The intent is to hook up (over time) with other open source search engines, such as the Lucene search engine project in Apache.

3. Migrating from IBM UIMA to Apache UIMA

Note: Before running the migration utility, be sure to back up your files, just in case you encounter any problems, because the migration tool updates the files in place in the directories where it finds them.

The migration utility is run by executing the script file apache-uima/bin/ibmUimaToApacheUima.bat (Windows) or apache-uima/bin/ibmUimaToApacheUima.sh (UNIX). You must pass one argument: the directory containing the files that you want to be migrated. Subdirectories will be processed recursively.

The script scans your files and applies the necessary updates, for example replacing the com.ibm package names with the new org.apache package names.

The script will only attempt to modify files with the extensions: java, xml, xmi, wsdd, properties, launch, bat, cmd, sh, ksh, or csh; and files with no extension. Also, files with size greater than 1,000,000 bytes will be skipped. (If you want the script to modify files with other extensions, you can edit the script file and change the -ext argument appropriately.)

If the migration tool reports warnings, there may be a few additional steps to take. The following two sections explain some simple manual changes that you might need to make to your code.

3.1. JCas Cover Classes for DocumentAnnotation

If you have run JCasGen it is likely that you have the classes com.ibm.uima.jcas.tcas.DocumentAnnotation and com.ibm.uima.jcas.tcas.DocumentAnnotation_Type as part of your code. This package name is no longer valid, and the migration utility does not move your files between directories so it is unable to fix this.

If you have not made manual modifications to these classes, the best solution is usually to just delete these two classes (and their containing package). There is a default version in the uima-document-annotation.jar file that is included in Apache UIMA. If you have made custom changes, then you should not delete the file but instead move it to the correct package org.apache.uima.jcas.tcas. For more information about JCas and DocumentAnnotation please see Section 5.5.4, "Adding Features to DocumentAnnotation" in the UIMA References manual.

3.2. JCas.getDocumentAnnotation

The deprecated method JCas.getDocumentAnnotation has been removed. Its use must be replaced with JCas.getDocumentAnnotationFs. The method JCas.getDocumentAnnotationFs() returns type TOP, so your code must cast this to type DocumentAnnotation. The reasons for this are described in Section 5.5.4, "Adding Features to DocumentAnnotation" in the UIMA References manual.

3.3. Rare Cases Where Additional Manual Migration is Necessary

For most users there should not be any additional migration steps necessary. However, if the migration tool reported an additional warning or if you are having trouble getting your code to compile or run after running the migration, please see Section 1.4.2. "Rare Cases Where Additional Manual Migration is Necessary," in the Overview and Setup manual.

Bug

[UIMA-14] - Plugin manifests still list IBM as the vendor
[UIMA-16] - NullPointerException in UIMAFramework.newCollectionProcessingManager
[UIMA-20] - PearMerger unit test failure
[UIMA-22] - Tools still use IBM splashscreen
[UIMA-23] - setUimaClassPath and adjustExamplePaths scripts have incorrect jar names
[UIMA-26] - Incorrect paths in example descriptors and the adjustExamplePaths scripts
[UIMA-27] - org.apache.uima.cas.test.SofaTest creates file that is not deleted afterwards; moreover, this file has been checked into SVN.
[UIMA-29] - Can't call process twice on CPE
[UIMA-31] - Document Analyzer doesn't save character encoding in preferences
[UIMA-34] - Vinci service deployment descriptor timeoutPeriod parameter does not work.
[UIMA-40] - CasAnnotationViewer doesn't support new primitive types
[UIMA-41] - LowLevelCAS.ll_getTypeClass() needs to be updated for the new 2.0 types
[UIMA-43] - The ';' path separator char is not replaced with OS dependent char in installed PEAR
[UIMA-44] - rename IBMResultPrinter class
[UIMA-46] - Duplicate feature name on supertype and subtype does not work if subtype definition comes first in descriptor
[UIMA-57] - UimacppAnalysisEngine should be named UimacppAnalysisComponent
[UIMA-58] - Resources directory missing from examples project
[UIMA-61] - CasCreationUtils.createCas(Collection) silently ignores TypeSystemDescription objects,
[UIMA-65] - CAS.setSofaDataString on Initial View throws ArrayIndexOutOfBoundsException
[UIMA-75] - In VinciService wrapper, "serializerClassName" parameter is unused
[UIMA-77] - CasToInlineXml fails for new primitive types
[UIMA-79] - Document Analyzer progress message sometimes says something like "Processed 8 of 7 Documents"
[UIMA-80] - CVD document text is deleted after an analysis engine is loaded
[UIMA-81] - feature values containing "<>" are not displayed correctly in CVD
[UIMA-83] - CDE Parameter Type drop-down is not properly sized on Mac.
[UIMA-84] - CDE's "Find AE" Dialog does not work on Mac.
[UIMA-85] - CDE "Open in new window..." feature does not work on Mac.
[UIMA-86] - Aggregate descriptor that imports itself causes hang
[UIMA-94] - Pear merger default output name uses "tae" - should use "ae"
[UIMA-99] - CDE incorrect tooltip for configuration parameter value field
[UIMA-100] - Flow Controller logger is not given the correct name
[UIMA-101] - PEAR Installer prints warning messages on startup
[UIMA-102] - Incubator icon on web page should link to incubator site, not UIMA site
[UIMA-105] - Circular Imports Leave Duplicates
[UIMA-113] - socket timeout exception is (sometimes?) embedded in sax exception, resulting in a retry
[UIMA-115] - The TCAS class should be dropped
[UIMA-117] - CVD Help->Manual menu item gives "invalid url" error dialog.
[UIMA-122] - Sofa mapping should be removed from Vinci services
[UIMA-128] - ll_setStringValue not checking if feature range is subtype of String with Allowed Values, not doing Allowed Value check
[UIMA-129] - $main_root replacement does not work for components of a merged pear file.
[UIMA-130] - ResourceCreationSpecifier.validate() provides no way to pass datapath information
[UIMA-133] - PEAR Merger adds only 1 JAR file from the delegate 'lib' folder to the generated aggregate CLASSPATH
[UIMA-136] - String subtype test case break Maven build
[UIMA-137] - Import_implTest fails on mvn package
[UIMA-138] - Example MeetingFinderCPE_withXmlDetagging.xml doesn't work
[UIMA-141] - JCasGen - bring over missing templates and template build tool jet_expander
[UIMA-142] - JCas version of getLocalFSData() returning wrong type
[UIMA-144] - Wrong impl in JCasImpl for getView(local-view-name)
[UIMA-148] - Calls to URL.equals and URL.hashCode should be removed
[UIMA-150] - DebugFSLogicalStructure - fix cache setting for Unexpanded Feature Structures (found by Findbugs)
[UIMA-153] - DocBook formatting fixes - style sheets and CSS wrong for table centering, plus fixes for CVD
[UIMA-163] - CpeCasProcessors.removeCasProcessor always throws "invalid index" exception
[UIMA-165] - Need JCAS fixes for DocumentatAnnotation and type merging
[UIMA-186] - AnalysisEngine.setResultSpecification doesn't clear out previously cached result specification information
[UIMA-187] - getSofaDataString() is documented in the manual but does not exist in the code
[UIMA-188] - AnalysisEngine_implTest incorrectly failing on PowerMac platform
[UIMA-189] - Maven building on *nix platforms including Mac OS X broken in several ways
[UIMA-190] - Type priority test case failing with IBM JDK 1.5
[UIMA-191] - CDE: adding feature value type in default namespace does not work correctly
[UIMA-192] - Schema validation doesn't work with Sun Java 1.4, causes fatal error.
[UIMA-193] - PEAR Encoding Test gives NullPointerException under Sun Java 1.4.2
[UIMA-195] - Logging test fails when logger properties file and log file are specified for unit tests
[UIMA-196] - IteratorTest.testIterator() fails with Sun Java 1.6
[UIMA-197] - TypePriorityTest.testMain() fails with Sun Java 1.6
[UIMA-198] - CPE Test Cases fail when run with "mvn test"
[UIMA-199] - JMX Support has problems with AE names containing special characters (e.g. commas)
[UIMA-200] - Excessive releases of CAS on error in Aggregate CasMultiplier
[UIMA-202] - AnalysisEngineDescription.getDelegateAnalysisEngineSpecifiers() should only resolve delegate imports, not other imports
[UIMA-203] - maven build for eclipse plugins is inconsistent - the runtime plugin may be missing the .jar packaging
[UIMA-204] - Plugin builds have zip files with extra top level directories
[UIMA-205] - CDE fails to add/rmv flow constraints when user-defined flow is specified without any flow constraints
[UIMA-206] - setSofaDataXXX(xxx, mime) methods do not set sofa mime feature
[UIMA-207] - Documentation errors
[UIMA-209] - FeatureStructure.equals returns false for same FS obtained through different views
[UIMA-210] - faulty use of .read(buffer...) in several places - not checking for fewer than expected bytes/chars read
[UIMA-213] - DocumentAnalyzer/RunAE tools don't support XML detagging and Remote Vinci AEs
[UIMA-214] - DocumentAnalyzer shouldn't have to re-contact service to get the typesystem
[UIMA-217] - actions creating new instances which are subtypes of AnnotationBase should set the sofa ref
[UIMA-220] - Failure in XCasToCasDataSaxHandlerTest on Sun Java 1.4.2
[UIMA-221] - adjustExamplePaths.sh has incorrect jar file names in classpath
[UIMA-222] - JavaDoc is not being built
[UIMA-223] - JAR file name changes not mentioned in documentation or handled by migration script
[UIMA-225] - doFullValidation fails for C++ Annotator Descriptor
[UIMA-226] - In uimaj-examples some resource files are under src/main/java instead of src/main/resources
[UIMA-227] - Distribution docs directory includes XXX_pdf_src.xml files
[UIMA-229] - Bad error message if aggregate descriptor flow contains undefined key
[UIMA-230] - CPE GUI sometimes won't start if JList widgets are in use
[UIMA-231] - CPEGUI clearAll doesn't reset file chooser directory consistently
[UIMA-233] - CAS View caching works incorrectly
[UIMA-234] - CAS Multiplier "internal" CASes should have identical type system to CPE CAS Pool
[UIMA-244] - CPE GUI Intermittent Failure on Startup
[UIMA-245] - CPE GUI on exit says settings have been changed even when they haven't
[UIMA-247] - Drop "XCAS" from name of Annotation Viewer GUI
[UIMA-250] - ClassNotFoundException for org.apache.uima.adapter.ServiceDataCargo when SOAP service should be deployed
[UIMA-252] - bold italics highlighting does not work in code sections of the html documentation
[UIMA-253] - default path for CAS annotation viewer does not exist
[UIMA-256] - CVD manual not displayed in distribution
[UIMA-257] - Document Analyzer sometimes names style map file incorrectly
[UIMA-261] - In Glossary section of docs, linked glossary terms are not rendered.
[UIMA-262] - CAS Visual Debugger command line parameters does not work
[UIMA-263] - CAS Visual Debugger shows an error message when a user tries to open the log file but not log file was written.
[UIMA-264] - CAS Visual Debugger does not support CAS Multiplier components
[UIMA-266] - DocumentAnalyzer also use wrong default directory docs/examples/data
[UIMA-272] - CVD manual & help missing?
[UIMA-279] - toXML method of ServiceAlias has an apparent infinite loop
[UIMA-281] - org.apache.uima.pear.util.UIMAUtil.identifyUimaComponentCategory method may throw NullPointerException
[UIMA-283] - CAS Merger example component does not work with CVD/old API call
[UIMA-284] - Synchronization issues
[UIMA-285] - shell scripts test if JAVA_HOME is not set, and if so, set it to an invalid value
[UIMA-286] - Math.abs returns a negative number very occasionally, causing NameClient in jVinci to fail
[UIMA-288] - NPE possible in PEAR Util XMLUtil printError method
[UIMA-289] - CasProcessorDeploymentParamsImpl has incorrect equals test argument
[UIMA-290] - Wrong logical connector || when && was intended, CPMEngine
[UIMA-291] - "bad interpreter" when using pearMerger.sh script on linux
[UIMA-292] - UIMA incompatible with Java Version 6
[UIMA-293] - If CasCopier's destination CAS is set to a base CAS, annotations cannot be copied.
[UIMA-294] - Sofa mapping failure in mixed CPE pipeline
[UIMA-297] - Automatic bag index creates duplicate copies of FSs under some circumstances
[UIMA-298] - XMI CAS Serializer can map two namespaces to the same XML NS prefix
[UIMA-300] - DocumentAnalyzer cant be run by multiple users on the same machine
[UIMA-302] - adjustExamplePaths scripts replace "C:/Temp" with "temp"
[UIMA-304] - Document Analyzer HTML View Not Working
[UIMA-308] - org.apache.uima.cas.impl.AnnotationBaseImpl.toString() bombs for base annotations without sofa.
[UIMA-310] - CasCopier fails on null array elements
[UIMA-312] - assembly of docs/html missing the css dirs and file
[UIMA-313] - Flow Controller example Analysis Engine does not work properly
[UIMA-314] - MeetingFinderCPE_Managed.xml doesn't run on Linux
[UIMA-315] - CDE - hover in aggregate page supposed to show description - but isn't
[UIMA-317] - MeetingFinderCPE_Managed.xml is duplicate of MeetingFinderCPE_Managed_Windows.xml
[UIMA-320] - add LICENSE and NOTICE files to META-INF folder of a jar

Improvement

[UIMA-9] - Remove support for xi:include
[UIMA-10] - Split JCas into interface and implementation
[UIMA-11] - org.apache.itu package should be renamed
[UIMA-24] - Uninformative error message when trying to create an AE from a descriptor whose frameworkImplementation is incorrect
[UIMA-28] - 2.0 examples use deprecated methods
[UIMA-33] - Do not use same timeout on GetMetadata as is used for Process
[UIMA-51] - Add version number to XCAS (or maybe to CAS built-in typesystem?)
[UIMA-70] - JavaDoc doesn't match impl
[UIMA-76] - add new pear installer API to install a pear file easily out of an application
[UIMA-78] - CPE descriptor should support URLs to reference components
[UIMA-90] - Create eclipse run configurations for startVNS and startVinciService
[UIMA-91] - Remove/hide CAS Initializer Panel in CPE GUI?
[UIMA-92] - Add Logging to WhiteboardFlowController
[UIMA-93] - Mailing list subscription instructions are not obvious
[UIMA-96] - XML descriptor capitalization is inconsistent
[UIMA-97] - OpenNLP wrapper examples should use new annotator interfaces
[UIMA-104] - PackageInstaller and PackageInstallerException need class javadoc comments
[UIMA-110] - TCAS.getAnnotationIndex(Type type) should throw exception if type is not a subtype of annotation.
[UIMA-112] - Assembly should unpack into a directory
[UIMA-116] - Always deliver the base CAS to the process method
[UIMA-119] - Fix docs around result spec to reflect changed design
[UIMA-120] - Logical Structure view of CAS: show view name in unexpanded form of CAS
[UIMA-121] - Documentation Formatting improvements
[UIMA-126] - add news section to the website
[UIMA-132] - Provide better support for filenames with spaces in resource URL
[UIMA-134] - Extend CasCopier to support multiple views
[UIMA-145] - Port CVD documentation to DocBook
[UIMA-149] - Cloning may fail for subclasses of ResultSpecification
[UIMA-156] - CVD uses deprecated API for setting log file.
[UIMA-157] - CAS / CasView API refactoring
[UIMA-160] - Logger properties files live in root directory of distribution
[UIMA-161] - adding documentation for PEAR API
[UIMA-171] - Make CVD look-and-feel configurable
[UIMA-173] - Create Default Bag Indexes when addFsToIndexes is called but no index has been defined for that type
[UIMA-177] - JCasGen should notify user when "type merging" has occurred
[UIMA-178] - CVD can not display long string values
[UIMA-179] - Need method JFSIndexRepository.getAllIndexedFS(type)
[UIMA-183] - (CAS/JCAS).getAnnotationIndex should declare return type AnnotationIndex
[UIMA-184] - Add getAnnotationIndex() to JCas API, impl via forwarding to CAS
[UIMA-185] - Make CAS use same Exception concept as the rest of UIMA
[UIMA-208] - Merge TaeSpecifierSchema.xsd with resourceSpecifierSchema.xsd
[UIMA-212] - Turn on socket keepAlive in jVinci
[UIMA-215] - CasCopier constructor should take source CAS as argument
[UIMA-216] - Add getSupportedXCasVersions to Vinci Services
[UIMA-218] - Creating an subtype of an AnnotationBase in a "base" CAS gives wrong / misleading error message "Can't create FS of type xxx with this method"
[UIMA-235] - improve example tutorial code to use Matcher / find() in default mode
[UIMA-237] - Change source build to include only the docbook system zip files
[UIMA-241] - Migration Tool improvement
[UIMA-243] - Update EMF installation instructions
[UIMA-246] - Add documentation summarizing timeouts and how to set them
[UIMA-249] - PDF, html and javadocs documentation should be in separate subdirectories
[UIMA-251] - CVD should not use a banner
[UIMA-259] - add an overview HTML document with links to the different HTML book
[UIMA-265] - impove CAS Multiplier documentation that CAS Multipliers does not work in CVD and CPE
[UIMA-274] - CDE add new feature - not making visible the additional input fields for element types when range type changed
[UIMA-305] - Move FileUtils to non-internal package
[UIMA-311] - Docbook build script usability issue
[UIMA-318] - Broken hyperlinks to html docbooks in Release_Notes

New Feature

[UIMA-49] - Migration tools from IBM UIMA to Apache UIMA
[UIMA-62] - Provide an example CasMultiplier that merges CASes
[UIMA-152] - add component test utilities project
[UIMA-164] - Add source distribution
[UIMA-224] - Add release signing and verification info

Task

[UIMA-1] - Reorganize SVN
[UIMA-2] - Fix licensing issues with Eclipse plugins
[UIMA-3] - Split big book into 4, rewrite to be appropriate for Apache UIMA, redo in DocBook, get it generate high-quality PDF
[UIMA-5] - re-organize docbook project and update ant build scripts
[UIMA-6] - Include documentation in assembly
[UIMA-7] - Add javadocs to build
[UIMA-8] - Add examples to assembly
[UIMA-15] - Semantic Search repackaging
[UIMA-21] - Update Version Number
[UIMA-36] - Change descriptor XML namespaces from uima.watson.ibm.com to uima.apache.org
[UIMA-37] - Track down and replace remaining occurrences of IBM
[UIMA-45] - Review and clean up unit tests
[UIMA-48] - Reformat all source code to match adopted conventions
[UIMA-64] - Remove the package org.apache.uima.tttypesystem
[UIMA-103] - Add license headers to batch files and shell scripts
[UIMA-123] - Remove VinciCasObjectProcessorService_impl
[UIMA-135] - Remove Entity View mode from DocumentAnalyzer
[UIMA-159] - Add license/notices etc. files for distribution
[UIMA-167] - Find and verify Docbook license(s)
[UIMA-168] - undo CommonCas change
[UIMA-172] - Status page needs updating
[UIMA-174] - Remove @author tags from Java source
[UIMA-176] - put apache board statuses into a page linked from our uima website and update uima project status page to reflect this
[UIMA-180] - Update "What's new in Apache UIMA 2.0" section of documentation
[UIMA-211] - Restructure directory uimaj-tools/src/main/org.apache.uima.jcasgen
[UIMA-232] - Documentation screenshots must be redone
[UIMA-240] - Fixup Readme - needs to have other material in it, and should have a file extension I think
[UIMA-254] - Create Release Notes
[UIMA-260] - Set env vars in setUimaClasspath and the Eclipser run configs
[UIMA-295] - Restore microscope icon as the window icon for our tools
[UIMA-296] - Remove DocBook build files from source distribution
[UIMA-299] - Remove SNAPSHOT from version numbers prior to release
[UIMA-332] - Fix copyrights, notice, status page for 2.1 release
[UIMA-333] - Fix issues for Maven artifact packaging
[UIMA-334] - Elimnate redundant LICENSE, NOTICE, DISCLAIMER files from source distribution
[UIMA-335] - Remove extra license file in src distr