SimpleNaiveBayesDocumentClassifier (Lucene 6.5.1 API)

java.lang.Object
- org.apache.lucene.classification.SimpleNaiveBayesClassifier
- - org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier

All Implemented Interfaces:

Classifier<BytesRef>, DocumentClassifier<BytesRef>
```
public class SimpleNaiveBayesDocumentClassifier
extends SimpleNaiveBayesClassifier
implements DocumentClassifier<BytesRef>
```
A simplistic Lucene based NaiveBayes classifier, see http://en.wikipedia.org/wiki/Naive_Bayes_classifier

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields
Modifier and Type Field and Description

protected Map<String,Analyzer> field2analyzer
Analyzer to be used for tokenizing document fields
- Fields inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier
  analyzer, classFieldName, indexReader, indexSearcher, query, textFieldNames

Fields
Modifier and Type	Field and Description
`protected Map<String,Analyzer>`	`field2analyzer` `Analyzer` to be used for tokenizing document fields

Constructor Summary

Constructors
Constructor and Description
`SimpleNaiveBayesDocumentClassifier(IndexReader indexReader, Query query, String classFieldName, Map<String,Analyzer> field2analyzer, String... textFieldNames)` Creates a new NaiveBayes classifier.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`ClassificationResult<BytesRef>`	`assignClass(Document document)` Assign a class (with score) to the given `Document`
`List<ClassificationResult<BytesRef>>`	`getClasses(Document document)` Get all the classes (sorted by score, descending) assigned to the given `Document`.
`List<ClassificationResult<BytesRef>>`	`getClasses(Document document, int max)` Get the first `max` classes (sorted by score, descending) assigned to the given text String.
`protected String[]`	`getTokenArray(TokenStream tokenizedText)` Returns a token array from the `TokenStream` in input

Methods inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier
assignClass, assignClassNormalizedList, countDocsWithClass, getClasses, getClasses, normClassificationResults, tokenize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - field2analyzer
```
protected Map<String,Analyzer> field2analyzer
```
    Analyzer to be used for tokenizing document fields
- Constructor Detail
  - SimpleNaiveBayesDocumentClassifier
```
public SimpleNaiveBayesDocumentClassifier(IndexReader indexReader,
                                          Query query,
                                          String classFieldName,
                                          Map<String,Analyzer> field2analyzer,
                                          String... textFieldNames)
```
    Creates a new NaiveBayes classifier.
    
    Parameters:
    
    indexReader - the reader on the index to be used for classification
    
    query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
    
    classFieldName - the name of the field used as the output for the classifier NOTE: must not be havely analyzed as the returned class will be a token indexed for this field
    
    textFieldNames - the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10
- Method Detail
  - assignClass
```
public ClassificationResult<BytesRef> assignClass(Document document)
                                           throws IOException
```
    Assign a class (with score) to the given Document
    
    Specified by:
    
    assignClass in interface DocumentClassifier<BytesRef>
    
    Parameters:
    
    document - a Document to be classified. Fields are considered features for the classification.
    
    Returns:
    
    a ClassificationResult holding assigned class of type T and score
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getClasses
```
public List<ClassificationResult<BytesRef>> getClasses(Document document)
                                                throws IOException
```
    Get all the classes (sorted by score, descending) assigned to the given Document.
    
    Specified by:
    
    getClasses in interface DocumentClassifier<BytesRef>
    
    Parameters:
    
    document - a Document to be classified. Fields are considered features for the classification.
    
    Returns:
    
    the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getClasses
```
public List<ClassificationResult<BytesRef>> getClasses(Document document,
                                                       int max)
                                                throws IOException
```
    Get the first max classes (sorted by score, descending) assigned to the given text String.
    
    Specified by:
    
    getClasses in interface DocumentClassifier<BytesRef>
    
    Parameters:
    
    document - a Document to be classified. Fields are considered features for the classification.
    
    max - the number of return list elements
    
    Returns:
    
    the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getTokenArray
```
protected String[] getTokenArray(TokenStream tokenizedText)
                          throws IOException
```
    Returns a token array from the TokenStream in input
    
    Parameters:
    
    tokenizedText - the tokenized content of a field
    
    Returns:
    
    a String array of the resulting tokens
    
    Throws:
    
    IOException - If tokenization fails because there is a low-level I/O error

Class SimpleNaiveBayesDocumentClassifier

Field Summary

Fields inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier

Methods inherited from class java.lang.Object

Field Detail

field2analyzer

Constructor Detail

SimpleNaiveBayesDocumentClassifier

Method Detail

assignClass

getClasses

getClasses

getTokenArray