public final class NGramTokenFilter extends TokenFilter
If you were using this TokenFilter
to perform partial highlighting,
this won't work anymore since this filter doesn't update offsets. You should
modify your analysis chain to use NGramTokenizer
, and potentially
override NGramTokenizer.isTokenChar(int)
to perform pre-tokenization.
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_NGRAM_SIZE
Deprecated.
since 7.4 - this value will be required.
|
static int |
DEFAULT_MIN_NGRAM_SIZE
Deprecated.
since 7.4 - this value will be required.
|
static boolean |
DEFAULT_PRESERVE_ORIGINAL |
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
NGramTokenFilter(TokenStream input)
Deprecated.
since 7.4. Use
NGramTokenFilter(TokenStream, int, int, boolean) instead. |
NGramTokenFilter(TokenStream input,
int gramSize)
Creates an NGramTokenFilter that produces n-grams of the indicated size.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram)
Deprecated.
since 7.4. Use
NGramTokenFilter(TokenStream, int, int, boolean) instead. |
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram,
boolean preserveOriginal)
Creates an NGramTokenFilter that, for a given input term, produces all
contained n-grams with lengths >= minGram and <= maxGram.
|
Modifier and Type | Method and Description |
---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
close
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
@Deprecated public static final int DEFAULT_MIN_NGRAM_SIZE
@Deprecated public static final int DEFAULT_MAX_NGRAM_SIZE
public static final boolean DEFAULT_PRESERVE_ORIGINAL
public NGramTokenFilter(TokenStream input, int minGram, int maxGram, boolean preserveOriginal)
input
- TokenStream
holding the input to be tokenizedminGram
- the minimum length of the generated n-gramsmaxGram
- the maximum length of the generated n-gramspreserveOriginal
- Whether or not to keep the original term when it
is shorter than minGram or longer than maxGrampublic NGramTokenFilter(TokenStream input, int gramSize)
input
- TokenStream
holding the input to be tokenizedgramSize
- the size of n-grams to generate.@Deprecated public NGramTokenFilter(TokenStream input, int minGram, int maxGram)
NGramTokenFilter(TokenStream, int, int, boolean)
instead.
Behaves the same as
NGramTokenFilter(input, minGram, maxGram, false)
input
- TokenStream
holding the input to be tokenizedminGram
- the minimum length of the generated n-gramsmaxGram
- the maximum length of the generated n-grams@Deprecated public NGramTokenFilter(TokenStream input)
NGramTokenFilter(TokenStream, int, int, boolean)
instead.
Behaves the same as
NGramTokenFilter(input, 1, 2, false)
input
- TokenStream
holding the input to be tokenizedpublic final boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void reset() throws IOException
reset
in class TokenFilter
IOException
public void end() throws IOException
end
in class TokenFilter
IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.