MLlib (RDD-based)#
Classification#
  | 
Classification model trained using Multinomial/Binary Logistic Regression.  | 
Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent.  | 
|
Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS.  | 
|
  | 
Model for Support Vector Machines (SVMs).  | 
Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.  | 
|
  | 
Model for Naive Bayes classifiers.  | 
Train a Multinomial Naive Bayes model.  | 
|
Train or predict a logistic regression model on streaming data.  | 
Clustering#
  | 
A clustering model derived from the bisecting k-means method.  | 
A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.  | 
|
  | 
A clustering model derived from the k-means method.  | 
  | 
K-means clustering.  | 
  | 
A clustering model derived from the Gaussian Mixture Model method.  | 
Learning algorithm for Gaussian Mixtures using the expectation-maximization algorithm.  | 
|
  | 
Model produced by   | 
Power Iteration Clustering (PIC), a scalable graph clustering algorithm.  | 
|
  | 
Provides methods to set k, decayFactor, timeUnit to configure the KMeans algorithm for fitting and predicting on incoming dstreams.  | 
  | 
Clustering model which can perform an online update of the centroids.  | 
  | 
Train Latent Dirichlet Allocation (LDA) model.  | 
  | 
A clustering model derived from the LDA method.  | 
Evaluation#
  | 
Evaluator for binary classification.  | 
  | 
Evaluator for regression.  | 
  | 
Evaluator for multiclass classification.  | 
  | 
Evaluator for ranking algorithms.  | 
Feature#
  | 
Normalizes samples individually to unit Lp norm  | 
  | 
Represents a StandardScaler model that can transform vectors.  | 
  | 
Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.  | 
  | 
Maps a sequence of terms to their term frequencies using the hashing trick.  | 
  | 
Represents an IDF model that can transform term frequency vectors.  | 
  | 
Inverse document frequency (IDF).  | 
  | 
Word2Vec creates vector representation of words in a text corpus.  | 
  | 
class for Word2Vec model  | 
  | 
Creates a ChiSquared feature selector.  | 
  | 
Represents a Chi Squared selector model.  | 
  | 
Scales each column of the vector, with the supplied weight vector.  | 
Frequency Pattern Mining#
  | 
A Parallel FP-growth algorithm to mine frequent itemsets.  | 
  | 
A FP-Growth model for mining frequent itemsets using the Parallel FP-Growth algorithm.  | 
A parallel PrefixSpan algorithm to mine frequent sequential patterns.  | 
|
  | 
Model fitted by PrefixSpan  | 
Vector and Matrix#
  | 
|
  | 
A dense vector represented by a value array.  | 
  | 
A simple sparse vector class for passing data to MLlib.  | 
  | 
Factory methods for working with vectors.  | 
  | 
|
  | 
Column-major dense matrix.  | 
  | 
Sparse Matrix stored in CSC format.  | 
  | 
|
  | 
Represents QR factors.  | 
Distributed Representation#
  | 
Represents a distributed matrix in blocks of local matrices.  | 
  | 
Represents a matrix in coordinate format.  | 
Represents a distributively stored matrix backed by one or more RDDs.  | 
|
  | 
Represents a row of an IndexedRowMatrix.  | 
  | 
Represents a row-oriented distributed Matrix with indexed rows.  | 
  | 
Represents an entry of a CoordinateMatrix.  | 
  | 
Represents a row-oriented distributed Matrix with no meaningful row indices.  | 
  | 
Represents singular value decomposition (SVD) factors.  | 
Random#
Generator methods for creating RDDs comprised of i.i.d samples from some distribution.  | 
Recommendation#
  | 
A matrix factorisation model trained by regularized alternating least-squares.  | 
  | 
Alternating Least Squares matrix factorization  | 
  | 
Represents a (user, product, rating) tuple.  | 
Regression#
  | 
Class that represents the features and labels of a data point.  | 
  | 
A linear model that has a vector of coefficients and an intercept.  | 
  | 
A linear regression model derived from a least-squares fit.  | 
Train a linear regression model with no regularization using Stochastic Gradient Descent.  | 
|
  | 
A linear regression model derived from a least-squares fit with an l_2 penalty term.  | 
Train a regression model with L2-regularization using Stochastic Gradient Descent.  | 
|
  | 
A linear regression model derived from a least-squares fit with an l_1 penalty term.  | 
Train a regression model with L1-regularization using Stochastic Gradient Descent.  | 
|
  | 
Regression model for isotonic regression.  | 
Isotonic regression.  | 
|
  | 
Base class that has to be inherited by any StreamingLinearAlgorithm.  | 
  | 
Train or predict a linear regression model on streaming data.  | 
Statistics#
  | 
Trait for multivariate statistical summary of a data matrix.  | 
  | 
Contains test results for the chi-squared hypothesis test.  | 
  | 
Represents a (mu, sigma) tuple  | 
Estimate probability density at required points given an RDD of samples from the population.  | 
|
  | 
Contains test results for the chi-squared hypothesis test.  | 
  | 
Contains test results for the Kolmogorov-Smirnov test.  | 
Tree#
  | 
A decision tree model for classification or regression.  | 
Learning algorithm for a decision tree model for classification or regression.  | 
|
  | 
Represents a random forest model.  | 
Learning algorithm for a random forest model for classification or regression.  | 
|
  | 
Represents a gradient-boosted tree model.  | 
Learning algorithm for a gradient boosted trees model for classification or regression.  | 
Utilities#
Mixin for classes which can load saved models using its Scala implementation.  | 
|
Mixin for models that provide save() through their Scala implementation.  | 
|
Utils for generating linear data.  | 
|
  | 
Mixin for classes which can load saved models from files.  | 
  | 
Helper methods to load, save and pre-process data used in MLlib.  | 
  | 
Mixin for models and transformers which may be saved as files.  |