BigDL module to be optimized
BigDL criterion method
The size (Tensor dimensions) of the feature data. e.g. an image may be with width * height = 28 * 28, featureSize = Array(28, 28).
The size (Tensor dimensions) of the label data.
BigDL criterion method
When to stop the training, passed in a Trigger.
When to stop the training, passed in a Trigger. E.g. Trigger.maxIterations
The size (Tensor dimensions) of the feature data.
The size (Tensor dimensions) of the feature data. e.g. an image may be with width * height = 28 * 28, featureSize = Array(28, 28).
Get conversion function to extract data from original DataFrame Default: 0
Get conversion function to extract data from original DataFrame Default: 0
Statistics (LearningRate, Loss, Throughput, Parameters) collected during training for the validation data if validation data is set, which can be used for visualization via Tensorboard.
Statistics (LearningRate, Loss, Throughput, Parameters) collected during training for the validation data if validation data is set, which can be used for visualization via Tensorboard. Use setValidationSummary to enable validation logger. Then the log will be saved to logDir/appName/ as specified by the parameters of validationSummary.
Default: None
The size (Tensor dimensions) of the label data.
learning rate for the optimizer in the DLEstimator.
learning rate for the optimizer in the DLEstimator. Default: 0.001
learning rate decay for each iteration.
learning rate decay for each iteration. Default: 0
Number of max Epoch for the training, an epoch refers to a traverse over the training data Default: 50
Number of max Epoch for the training, an epoch refers to a traverse over the training data Default: 50
BigDL module to be optimized
optimization method to be used.
optimization method to be used. BigDL supports many optimization methods like Adam, SGD and LBFGS. Refer to package com.intel.analytics.bigdl.optim for all the options. Default: SGD
Statistics (LearningRate, Loss, Throughput, Parameters) collected during training for the training data, which can be used for visualization via Tensorboard.
Statistics (LearningRate, Loss, Throughput, Parameters) collected during training for the training data, which can be used for visualization via Tensorboard. Use setTrainSummary to enable train logger. Then the log will be saved to logDir/appName/train as specified by the parameters of TrainSummary.
Default: Not enabled
Set a validate evaluation during training
Set a validate evaluation during training
how often to evaluation validation set
validate data set
a set of validation method ValidationMethod
batch size for validation
this optimizer
Enable validation Summary
Validate if feature and label columns are of supported data types.
Validate if feature and label columns are of supported data types. Default: 0
sub classes can extend the method and return required model for different transform tasks
sub classes can extend the method and return required model for different transform tasks
DLEstimator helps to train a BigDL Model with the Spark ML Estimator/Transfomer pattern, thus Spark users can conveniently fit BigDL into Spark ML pipeline.
DLEstimator supports feature and label data in the format of Array[Double], Array[Float], org.apache.spark.mllib.linalg.{Vector, VectorUDT}, org.apache.spark.ml.linalg.{Vector, VectorUDT}, Double and Float.
User should specify the feature data dimensions and label data dimensions via the constructor parameters featureSize and labelSize respectively. Internally the feature and label data are converted to BigDL tensors, to further train a BigDL model efficiently.
For details usage, please refer to examples in package com.intel.analytics.bigdl.example.MLPipeline
(Since version 0.10.0)