Package org.apache.tika.eval.core.tokens
Class URLEmailNormalizingFilterFactory
java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenFilterFactory
org.apache.tika.eval.core.tokens.URLEmailNormalizingFilterFactory
public class URLEmailNormalizingFilterFactory
extends org.apache.lucene.analysis.util.TokenFilterFactory
Factory for filter that normalizes urls and emails to __url__ and __email__
respectively. WARNING:This will not work correctly unless the
UAX29URLEmailTokenizer
is used! This must be run _before_ the
AlphaIdeographFilterFactory
, or else the emails/urls will already
be removed!-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.lucene.analysis.TokenStream
create
(org.apache.lucene.analysis.TokenStream tokenStream) Methods inherited from class org.apache.lucene.analysis.util.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
URL
- See Also:
-
EMAIL
- See Also:
-
-
Constructor Details
-
URLEmailNormalizingFilterFactory
-
-
Method Details
-
create
public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream) - Specified by:
create
in classorg.apache.lucene.analysis.util.TokenFilterFactory
-