Package org.apache.tika.eval.core.tokens
Class CJKBigramAwareLengthFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.tika.eval.core.tokens.CJKBigramAwareLengthFilterFactory
-
public class CJKBigramAwareLengthFilterFactory extends org.apache.lucene.analysis.TokenFilterFactory
Creates a very narrowly focused TokenFilter that limits tokens based on length _unless_ they've been identified as <DOUBLE> or <SINGLE> by the CJKBigramFilter.This class is intended to be used when generating "common tokens" files.
-
-
Constructor Summary
Constructors Constructor Description CJKBigramAwareLengthFilterFactory()
CJKBigramAwareLengthFilterFactory(Map<String,String> args)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.lucene.analysis.TokenStream
create(org.apache.lucene.analysis.TokenStream tokenStream)
-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final String NAME
- See Also:
- Constant Field Values
-
-