public class DatasetSplitter extends Object
Constructor and Description |
---|
DatasetSplitter(double testRatio,
double crossValidationRatio)
Create a
DatasetSplitter by giving test and cross validation IDXs sizes |
Modifier and Type | Method and Description |
---|---|
void |
split(IndexReader originalIndex,
Directory trainingIndex,
Directory testIndex,
Directory crossValidationIndex,
Analyzer analyzer,
boolean termVectors,
String classFieldName,
String... fieldNames)
Split a given index into 3 indexes for training, test and cross validation tasks respectively
|
public DatasetSplitter(double testRatio, double crossValidationRatio)
DatasetSplitter
by giving test and cross validation IDXs sizestestRatio
- the ratio of the original index to be used for the test IDX as a double
between 0.0 and 1.0crossValidationRatio
- the ratio of the original index to be used for the c.v. IDX as a double
between 0.0 and 1.0public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) throws IOException
originalIndex
- an LeafReader
on the source indextrainingIndex
- a Directory
used to write the training indextestIndex
- a Directory
used to write the test indexcrossValidationIndex
- a Directory
used to write the cross validation indexanalyzer
- Analyzer
used to create the new docstermVectors
- true
if term vectors should be keptclassFieldName
- name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames
- names of fields that need to be put in the new indexes or null
if all should be usedIOException
- if any writing operation fails on any of the indexesCopyright © 2000-2017 Apache Software Foundation. All Rights Reserved.