bigdl.optim package¶
Submodules¶
bigdl.optim.optimizer module¶
-
class
bigdl.optim.optimizer.
ActivityRegularization
(l1, l2, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Apply both L1 and L2 regularization
:param l1 l1 regularization rate :param l2 l2 regularization rate
-
class
bigdl.optim.optimizer.
Adadelta
(decayrate=0.9, epsilon=1e-10, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
Adadelta implementation for SGD: http://arxiv.org/abs/1212.5701
:param decayrate interpolation parameter rho :param epsilon for numerical stability >>> adagrad = Adadelta() creating: createAdadelta
-
class
bigdl.optim.optimizer.
Adagrad
(learningrate=0.001, learningrate_decay=0.0, weightdecay=0.0, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
An implementation of Adagrad. See the original paper: http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
:param learningrate learning rate :param learningrate_decay learning rate decay :param weightdecay weight decay >>> adagrad = Adagrad() creating: createAdagrad
-
class
bigdl.optim.optimizer.
Adam
(learningrate=0.001, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-08, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
An implementation of Adam http://arxiv.org/pdf/1412.6980.pdf :param learningrate learning rate :param learningrate_decay learning rate decay :param beta1 first moment coefficient :param beta2 second moment coefficient :param epsilon for numerical stability >>> adagrad = Adam() creating: createAdam
-
class
bigdl.optim.optimizer.
Adamax
(learningrate=0.002, beta1=0.9, beta2=0.999, epsilon=1e-38, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
An implementation of Adamax http://arxiv.org/pdf/1412.6980.pdf :param learningrate learning rate :param beta1 first moment coefficient :param beta2 second moment coefficient :param epsilon for numerical stability >>> adagrad = Adamax() creating: createAdamax
-
class
bigdl.optim.optimizer.
BaseOptimizer
(jvalue, bigdl_type, *args)[source]¶ Bases:
bigdl.util.common.JavaValue
-
prepare_input
()[source]¶ Load input. Notebook user can call this method to seprate load data and create optimizer time
-
set_checkpoint
(checkpoint_trigger, checkpoint_path, isOverWrite=True)[source]¶ Configure checkpoint settings.
Parameters: - checkpoint_trigger – the interval to write snapshots
- checkpoint_path – the path to write snapshots into
- isOverWrite – whether to overwrite existing snapshots in path.default is True
-
set_criterion
(criterion)[source]¶ set new criterion, for optimizer reuse
Parameters: criterion – new criterion Returns:
-
set_gradclip_const
(min_value, max_value)[source]¶ Configure constant clipping settings.
Parameters: - min_value – the minimum value to clip by
- max_value – the maxmimum value to clip by
-
set_gradclip_l2norm
(clip_norm)[source]¶ Configure L2 norm clipping settings.
Parameters: clip_norm – gradient L2-Norm threshold
-
set_train_summary
(summary)[source]¶ Set train summary. A TrainSummary object contains information necessary for the optimizer to know how often the logs are recorded, where to store the logs and how to retrieve them, etc. For details, refer to the docs of TrainSummary.
Parameters: summary – a TrainSummary object
-
set_val_summary
(summary)[source]¶ Set validation summary. A ValidationSummary object contains information necessary for the optimizer to know how often the logs are recorded, where to store the logs and how to retrieve them, etc. For details, refer to the docs of ValidationSummary.
Parameters: summary – a ValidationSummary object
-
-
class
bigdl.optim.optimizer.
Default
(bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A learning rate decay policy, where the effective learning rate is calculated as base_lr * gamma ^ (floor(iter / step_size))
:param step_size :param gamma
>>> step = Default() creating: createDefault
-
class
bigdl.optim.optimizer.
DistriOptimizer
(model, training_rdd, criterion, end_trigger, batch_size, optim_method=None, bigdl_type='float')[source]¶
-
class
bigdl.optim.optimizer.
EveryEpoch
(bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. EveryEpoch is a trigger that triggers an action when each epoch finishs. Could be used as trigger in setvalidation and setcheckpoint in Optimizer, and also in TrainSummary.set_summary_trigger.
>>> everyEpoch = EveryEpoch() creating: createEveryEpoch
-
class
bigdl.optim.optimizer.
Exponential
(decay_step, decay_rate, stair_case=False, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
[[Exponential]] is a learning rate schedule, which rescale the learning rate by lr_{n + 1} = lr * decayRate ^ (iter / decayStep) :param decay_step the inteval for lr decay :param decay_rate decay rate :param stair_case if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.
>>> exponential = Exponential(100, 0.1) creating: createExponential
-
class
bigdl.optim.optimizer.
Ftrl
(learningrate=0.001, learningrate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, l2_shrinkage_regularization_strength=0.0, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
An implementation of Ftrl https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf. Support L1 penalty, L2 penalty and shrinkage-type L2 penalty.
:param learningrate learning rate :param learningrate_power double, must be less or equal to zero. Default is -0.5. :param initial_accumulator_value double, the starting value for accumulators, require zero or positive values. :param l1_regularization_strength double, must be greater or equal to zero. Default is zero. :param l2_regularization_strength double, must be greater or equal to zero. Default is zero. :param l2_shrinkage_regularization_strength double, must be greater or equal to zero. Default is zero. This differs from l2RegularizationStrength above. L2 above is a stabilization penalty, whereas this one is a magnitude penalty. >>> ftrl = Ftrl() creating: createFtrl >>> ftrl2 = Ftrl(1e-2, -0.1, 0.2, 0.3, 0.4, 0.5) creating: createFtrl
-
class
bigdl.optim.optimizer.
L1L2Regularizer
(l1, l2, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Apply both L1 and L2 regularization
:param l1 l1 regularization rate :param l2 l2 regularization rate
-
class
bigdl.optim.optimizer.
L1Regularizer
(l1, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Apply L1 regularization
:param l1 l1 regularization rate
-
class
bigdl.optim.optimizer.
L2Regularizer
(l2, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Apply L2 regularization
:param l2 l2 regularization rate
-
class
bigdl.optim.optimizer.
LBFGS
(max_iter=20, max_eval=1.7976931348623157e+308, tolfun=1e-05, tolx=1e-09, ncorrection=100, learningrate=1.0, verbose=False, linesearch=None, linesearch_options=None, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
This implementation of L-BFGS relies on a user-provided line search function (state.lineSearch). If this function is not provided, then a simple learningRate is used to produce fixed size steps. Fixed size steps are much less costly than line searches, and can be useful for stochastic problems. The learning rate is used even when a line search is provided. This is also useful for large-scale stochastic problems, where opfunc is a noisy approximation of f(x). In that case, the learning rate allows a reduction of confidence in the step size.
:param max_iter Maximum number of iterations allowed :param max_eval Maximum number of function evaluations :param tolfun Termination tolerance on the first-order optimality :param tolx Termination tol on progress in terms of func/param changes :param ncorrection :param learningrate :param verbose :param linesearch A line search function :param linesearch_options If no line search provided, then a fixed step size is used >>> lbfgs = LBFGS() creating: createLBFGS
-
class
bigdl.optim.optimizer.
LocalOptimizer
(X, Y, model, criterion, end_trigger, batch_size, optim_method=None, cores=None, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.BaseOptimizer
Create an optimizer.
Parameters: - model – the neural net model
- X – the training features which is an ndarray or list of ndarray
- Y – the training label which is an ndarray
- criterion – the loss function
- optim_method – the algorithm to use for optimization,
e.g. SGD, Adagrad, etc. If optim_method is None, the default algorithm is SGD. :param end_trigger: when to end the optimization :param batch_size: training batch size :param cores: by default is the total physical cores.
-
set_validation
(batch_size, X_val, Y_val, trigger, val_method=None)[source]¶ Configure validation settings.
Parameters: - batch_size – validation batch size
- X_val – features of validation dataset
- Y_val – label of validation dataset
- trigger – validation interval
- val_method – the ValidationMethod to use,e.g. “Top1Accuracy”, “Top5Accuracy”, “Loss”
-
class
bigdl.optim.optimizer.
Loss
(cri=None, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
This evaluation method is calculate loss of output with respect to target >>> from bigdl.nn.criterion import ClassNLLCriterion >>> loss = Loss() creating: createClassNLLCriterion creating: createLoss
>>> loss = Loss(ClassNLLCriterion()) creating: createClassNLLCriterion creating: createLoss
-
class
bigdl.optim.optimizer.
MAE
(bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
This evaluation method calculates the mean absolute error of output with respect to target.
>>> mae = MAE() creating: createMAE
-
class
bigdl.optim.optimizer.
MaxEpoch
(max_epoch, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. MaxEpoch is a trigger that triggers an action when training reaches the number of epochs specified by “max_epoch”. Usually used as end_trigger when creating an Optimizer.
>>> maxEpoch = MaxEpoch(2) creating: createMaxEpoch
-
class
bigdl.optim.optimizer.
MaxIteration
(max, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. MaxIteration is a trigger that triggers an action when training reaches the number of iterations specified by “max”. Usually used as end_trigger when creating an Optimizer.
>>> maxIteration = MaxIteration(20) creating: createMaxIteration
-
class
bigdl.optim.optimizer.
MaxScore
(max, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A trigger that triggers an action when validation score larger than “max” score
>>> maxScore = MaxScore(0.4) creating: createMaxScore
-
class
bigdl.optim.optimizer.
MinLoss
(min, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A trigger that triggers an action when training loss less than “min” loss
>>> minLoss = MinLoss(0.1) creating: createMinLoss
-
class
bigdl.optim.optimizer.
MultiStep
(step_sizes, gamma, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
similar to step but it allows non uniform steps defined by stepSizes
Parameters: - step_size – the series of step sizes used for lr decay
- gamma – coefficient of decay
>>> step = MultiStep([2, 5], 0.3) creating: createMultiStep
-
class
bigdl.optim.optimizer.
OptimMethod
(jvalue, bigdl_type, *args)[source]¶ Bases:
bigdl.util.common.JavaValue
-
class
bigdl.optim.optimizer.
Optimizer
(model, training_rdd, criterion, end_trigger, batch_size, optim_method=None, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.BaseOptimizer
-
static
create
(model, training_set, criterion, end_trigger=None, batch_size=32, optim_method=None, cores=None, bigdl_type='float')[source]¶ Create an optimizer. Depend on the input type, the returning optimizer can be a local optimizer or a distributed optimizer.
Parameters: - model – the neural net model
- training_set – (features, label) for local mode. RDD[Sample] for distributed mode.
- criterion – the loss function
- optim_method – the algorithm to use for optimization,
e.g. SGD, Adagrad, etc. If optim_method is None, the default algorithm is SGD. :param end_trigger: when to end the optimization. default value is MapEpoch(1) :param batch_size: training batch size :param cores: This is for local optimizer only and use total physical cores as the default value
-
static
-
class
bigdl.optim.optimizer.
Plateau
(monitor, factor=0.1, patience=10, mode='min', epsilon=0.0001, cooldown=0, min_lr=0.0, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.
:param monitor quantity to be monitored, can be Loss or score :param factor factor by which the learning rate will be reduced. new_lr = lr * factor :param patience number of epochs with no improvement after which learning rate will be reduced. :param mode one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing :param epsilon threshold for measuring the new optimum, to only focus on significant changes. :param cooldown number of epochs to wait before resuming normal operation after lr has been reduced. :param min_lr lower bound on the learning rate.
>>> plateau = Plateau("score") creating: createPlateau
-
class
bigdl.optim.optimizer.
Poly
(power, max_iteration, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration. Calculation: base_lr (1 - iter/max_iteration) ^ (power)
Parameters: - power – coeffient of decay, refer to calculation formula
- max_iteration – max iteration when lr becomes zero
>>> poly = Poly(0.5, 2) creating: createPoly
-
class
bigdl.optim.optimizer.
RMSprop
(learningrate=0.01, learningrate_decay=0.0, decayrate=0.99, epsilon=1e-08, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
An implementation of RMSprop :param learningrate learning rate :param learningrate_decay learning rate decay :param decayrate decay rate, also called rho :param epsilon for numerical stability >>> adagrad = RMSprop() creating: createRMSprop
-
class
bigdl.optim.optimizer.
SGD
(learningrate=0.001, learningrate_decay=0.0, weightdecay=0.0, momentum=0.0, dampening=1.7976931348623157e+308, nesterov=False, leaningrate_schedule=None, learningrates=None, weightdecays=None, bigdl_type='float')[source]¶ Bases:
bigdl.optim.optimizer.OptimMethod
A plain implementation of SGD
:param learningrate learning rate :param learningrate_decay learning rate decay :param weightdecay weight decay :param momentum momentum :param dampening dampening for momentum :param nesterov enables Nesterov momentum :param learningrates 1D tensor of individual learning rates :param weightdecays 1D tensor of individual weight decays >>> sgd = SGD() creating: createDefault creating: createSGD
-
class
bigdl.optim.optimizer.
SequentialSchedule
(iteration_per_epoch, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Stack several learning rate schedulers.
Parameters: iterationPerEpoch – iteration numbers per epoch >>> sequentialSchedule = SequentialSchedule(5) creating: createSequentialSchedule >>> poly = Poly(0.5, 2) creating: createPoly >>> test = sequentialSchedule.add(poly, 5)
-
class
bigdl.optim.optimizer.
SeveralIteration
(interval, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. SeveralIteration is a trigger that triggers an action every “n” iterations. Could be used as trigger in setvalidation and setcheckpoint in Optimizer, and also in TrainSummary.set_summary_trigger.
>>> serveralIteration = SeveralIteration(2) creating: createSeveralIteration
-
class
bigdl.optim.optimizer.
Step
(step_size, gamma, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A learning rate decay policy, where the effective learning rate is calculated as base_lr * gamma ^ (floor(iter / step_size))
Parameters: - step_size –
- gamma –
>>> step = Step(2, 0.3) creating: createStep
-
class
bigdl.optim.optimizer.
Top1Accuracy
(bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Caculate the percentage that output’s max probability index equals target.
>>> top1 = Top1Accuracy() creating: createTop1Accuracy
-
class
bigdl.optim.optimizer.
Top5Accuracy
(bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Caculate the percentage that output’s max probability index equals target.
>>> top5 = Top5Accuracy() creating: createTop5Accuracy
-
class
bigdl.optim.optimizer.
TrainSummary
(log_dir, app_name, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A logging facility which allows user to trace how indicators (e.g. learning rate, training loss, throughput, etc.) change with iterations/time in an optimization process. TrainSummary is for training indicators only (check ValidationSummary for validation indicators). It contains necessary information for the optimizer to know where to store the logs, how to retrieve the logs, and so on. - The logs are written in tensorflow-compatible format so that they can be visualized directly using tensorboard. Also the logs can be retrieved as ndarrays and visualized using python libraries such as matplotlib (in notebook, etc.).
Use optimizer.setTrainSummary to enable train logger.
-
read_scalar
(tag)[source]¶ Retrieve train logs by type. Return an array of records in the format (step,value,wallClockTime). - “Step” is the iteration count by default.
Parameters: tag – the type of the logs, Supported tags are: “LearningRate”,”Loss”, “Throughput”
-
set_summary_trigger
(name, trigger)[source]¶ Set the interval of recording for each indicator.
Parameters: - tag – tag name. Supported tag names are “LearningRate”, “Loss”,”Throughput”, “Parameters”. “Parameters” is an umbrella tag thatincludes weight, bias, gradWeight, gradBias, and some running status(eg. runningMean and runningVar in BatchNormalization). If youdidn’t set any triggers, we will by default record Loss and Throughputin each iteration, while NOT recording LearningRate and Parameters,as recording parameters may introduce substantial overhead when themodel is very big, LearningRate is not a public attribute for allOptimMethod.
- trigger – trigger
-
-
class
bigdl.optim.optimizer.
TreeNNAccuracy
(bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
Caculate the percentage that output’s max probability index equals target.
>>> top1 = TreeNNAccuracy() creating: createTreeNNAccuracy
-
class
bigdl.optim.optimizer.
ValidationSummary
(log_dir, app_name, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A logging facility which allows user to trace how indicators (e.g. validation loss, top1 accuray, top5 accuracy etc.) change with iterations/time in an optimization process. ValidationSummary is for validation indicators only (check TrainSummary for train indicators). It contains necessary information for the optimizer to know where to store the logs, how to retrieve the logs, and so on. - The logs are written in tensorflow-compatible format so that they can be visualized directly using tensorboard. Also the logs can be retrieved as ndarrays and visualized using python libraries such as matplotlib (in notebook, etc.).
Use optimizer.setValidationSummary to enable validation logger.
-
read_scalar
(tag)[source]¶ Retrieve validation logs by type. Return an array of records in the format (step,value,wallClockTime). - “Step” is the iteration count by default.
Parameters: tag – the type of the logs. The tag should match the name ofthe ValidationMethod set into the optimizer. e.g.”Top1AccuracyLoss”,”Top1Accuracy” or “Top5Accuracy”.
-
-
class
bigdl.optim.optimizer.
Warmup
(delta, bigdl_type='float')[source]¶ Bases:
bigdl.util.common.JavaValue
A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration
Parameters: delta – increase amount after each iteration >>> warmup = Warmup(0.05) creating: createWarmup