Apache SAMOA is a platform for mining big data streams.

It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, Apache Samza, and Apache Flink. SAMOA is similar to Mahout in spirit, but specific designed for stream mining.

Apache SAMOA is simple and fun to use! This documentation is intended to give an introduction on how to use SAMOA in different ways. As a user you can run SAMOA algorithms on several stream processing engines: local mode, Storm, S4, Samza, and Flink. As a developer you can create new algorithms only once and test them in all of these distributed stream processing engines.

Getting Started

Users

Developers

Getting help

Discussion about SAMOA happens on the Apache development mailing list dev@samoa.incubator.org

[ subscribe | unsubscribe | archives ]