Class RegexNERecogniser

java.lang.Object
org.apache.tika.parser.ner.regex.RegexNERecogniser
All Implemented Interfaces:
NERecogniser

public class RegexNERecogniser extends Object implements NERecogniser
This class offers an implementation of NERecogniser based on Regular Expressions.

The default configuration file "ner-regex.txt" is used when no argument constructor is used to instantiate this class. The regex file is loaded via Class.getResourceAsStream(String), so the file should be placed in the same package path as of this class.

The format of regex configuration as follows:
 ENTITY_TYPE1=REGEX1
 ENTITY_TYPE2=REGEX2
 
For example, to extract week day from text:
WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
 
Since:
Nov. 7, 2015
  • Field Details

  • Constructor Details

    • RegexNERecogniser

      public RegexNERecogniser()
    • RegexNERecogniser

      public RegexNERecogniser(InputStream stream)
  • Method Details

    • getInstance

      public static RegexNERecogniser getInstance()
    • isAvailable

      public boolean isAvailable()
      Description copied from interface: NERecogniser
      checks if this Named Entity recogniser is available for service
      Specified by:
      isAvailable in interface NERecogniser
      Returns:
      true if this recogniser is ready to recognise, false otherwise
    • getEntityTypes

      public Set<String> getEntityTypes()
      Description copied from interface: NERecogniser
      gets a set of entity types whose names are recognisable by this
      Specified by:
      getEntityTypes in interface NERecogniser
      Returns:
      set of entity types/classes
    • findMatches

      public Set<String> findMatches(String text, Pattern pattern)
      finds matching sub groups in text
      Parameters:
      text - text containing interesting sub strings
      pattern - pattern to find sub strings
      Returns:
      set of sub strings if any found, or null if none found
    • recognise

      public Map<String,Set<String>> recognise(String text)
      Description copied from interface: NERecogniser
      call for name recognition action from text
      Specified by:
      recognise in interface NERecogniser
      Parameters:
      text - text with possibly contains names
      Returns:
      map of entityType -> set of names