Class URLEmailNormalizingFilterFactory

java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenFilterFactory
org.apache.tika.eval.core.tokens.URLEmailNormalizingFilterFactory

public class URLEmailNormalizingFilterFactory extends org.apache.lucene.analysis.util.TokenFilterFactory
Factory for filter that normalizes urls and emails to __url__ and __email__ respectively. WARNING:This will not work correctly unless the UAX29URLEmailTokenizer is used! This must be run _before_ the AlphaIdeographFilterFactory, or else the emails/urls will already be removed!
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     
    static final String
     

    Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory

    LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.lucene.analysis.TokenStream
    create(org.apache.lucene.analysis.TokenStream tokenStream)
     

    Methods inherited from class org.apache.lucene.analysis.util.TokenFilterFactory

    availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters

    Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory

    get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • URLEmailNormalizingFilterFactory

      public URLEmailNormalizingFilterFactory(Map<String,String> args)
  • Method Details

    • create

      public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream)
      Specified by:
      create in class org.apache.lucene.analysis.util.TokenFilterFactory