Class TextProfileSignature

TextStatsCalculator, TokenCountStatsCalculator<String>

public class TextProfileSignature extends Object implements TokenCountStatsCalculator<String>
Copied nearly directly from Apache Nutch:

See documentation:

This returns the base32 encoded sha256

  • Constructor Details

    • TextProfileSignature

      public TextProfileSignature()
  • Method Details

    • calculate

      public String calculate(TokenCounts tokenCounts)
      calculate in interface TokenCountStatsCalculator<String>
    • setMinTokenLength

      public void setMinTokenLength(int minTokenLength)
      Be careful -- for CJK languages, the default analyzer uses character bigrams. You will "ignore" all cjk language tokens if you set minTokenLength > 2!
      minTokenLength - -- include tokens of this length or greater.
    • setQuantRate

      public void setQuantRate(float quantRate)