Packages

c

org.apache.spark.mllib.regression

StreamingLinearAlgorithm

abstract class StreamingLinearAlgorithm[M <: GeneralizedLinearModel, A <: GeneralizedLinearAlgorithm[M]] extends Logging

StreamingLinearAlgorithm implements methods for continuously training a generalized linear model on streaming data, and using it for prediction on (possibly different) streaming data.

This class takes as type parameters a GeneralizedLinearModel, and a GeneralizedLinearAlgorithm, making it easy to extend to construct streaming versions of any analyses using GLMs. Initial weights must be set before calling trainOn or predictOn. Only weights will be updated, not an intercept. If the model needs an intercept, it should be manually appended to the input data.

For example usage, see StreamingLinearRegressionWithSGD.

NOTE: In some use cases, the order in which trainOn and predictOn are called in an application will affect the results. When called on the same DStream, if trainOn is called before predictOn, when new data arrive the model will update and the prediction will be based on the new model. Whereas if predictOn is called first, the prediction will use the model from the previous update.

NOTE: It is ok to call predictOn repeatedly on multiple streams; this will generate predictions for each one all using the current model. It is also ok to call trainOn on different streams; this will update the model using each of the different sources, in sequence.

Annotations
@Since( "1.1.0" )
Source
StreamingLinearAlgorithm.scala
Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StreamingLinearAlgorithm
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StreamingLinearAlgorithm()

Concrete Value Members

  1. def latestModel(): M

    Return the latest model.

    Return the latest model.

    Annotations
    @Since( "1.1.0" )
  2. def predictOn(data: JavaDStream[Vector]): JavaDStream[Double]

    Java-friendly version of predictOn.

    Java-friendly version of predictOn.

    Annotations
    @Since( "1.3.0" )
  3. def predictOn(data: DStream[Vector]): DStream[Double]

    Use the model to make predictions on batches of data from a DStream

    Use the model to make predictions on batches of data from a DStream

    data

    DStream containing feature vectors

    returns

    DStream containing predictions

    Annotations
    @Since( "1.1.0" )
  4. def predictOnValues[K](data: JavaPairDStream[K, Vector]): JavaPairDStream[K, Double]

    Java-friendly version of predictOnValues.

    Java-friendly version of predictOnValues.

    Annotations
    @Since( "1.3.0" )
  5. def predictOnValues[K](data: DStream[(K, Vector)])(implicit arg0: ClassTag[K]): DStream[(K, Double)]

    Use the model to make predictions on the values of a DStream and carry over its keys.

    Use the model to make predictions on the values of a DStream and carry over its keys.

    K

    key type

    data

    DStream containing feature vectors

    returns

    DStream containing the input keys and the predictions as values

    Annotations
    @Since( "1.1.0" )
  6. def trainOn(data: JavaDStream[LabeledPoint]): Unit

    Java-friendly version of trainOn.

    Java-friendly version of trainOn.

    Annotations
    @Since( "1.3.0" )
  7. def trainOn(data: DStream[LabeledPoint]): Unit

    Update the model by training on batches of data from a DStream.

    Update the model by training on batches of data from a DStream. This operation registers a DStream for training the model, and updates the model based on every subsequent batch of data from the stream.

    data

    DStream containing labeled data

    Annotations
    @Since( "1.1.0" )