We start showing how simple is to run a first large scale machine learning task in SAMOA. We will evaluate a bagging ensemble method using decision trees on the Forest Covertype dataset.

git clone http://git.apache.org/incubator-samoa.git
cd incubator-samoa
mvn package      #Local mode
wget "http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip"
unzip covtypeNorm.arff.zip 

Forest Covertype contains the forest cover type for 30 x 30 meter cells obtained from the US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581,012 instances and 54 attributes, and it has been used in several articles on data stream classification.

bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -l classifiers.ensemble.Bagging 
    -s (ArffFileStream -f covtypeNorm.arff) -f 100000"

The output will be a list of the evaluation results, plotted each 100,000 instances.