bigdl.optim package¶

Submodules¶

bigdl.optim.optimizer module¶

class bigdl.optim.optimizer.ActivityRegularization(l1, l2, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Apply both L1 and L2 regularization

:param l1 l1 regularization rate :param l2 l2 regularization rate

class bigdl.optim.optimizer.Adadelta(decayrate=0.9, epsilon=1e-10, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

Adadelta implementation for SGD: http://arxiv.org/abs/1212.5701

:param decayrate interpolation parameter rho :param epsilon for numerical stability >>> adagrad = Adadelta() creating: createAdadelta

class bigdl.optim.optimizer.Adagrad(learningrate=0.001, learningrate_decay=0.0, weightdecay=0.0, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

An implementation of Adagrad. See the original paper: http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

:param learningrate learning rate :param learningrate_decay learning rate decay :param weightdecay weight decay >>> adagrad = Adagrad() creating: createAdagrad

class bigdl.optim.optimizer.Adam(learningrate=0.001, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-08, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

An implementation of Adam http://arxiv.org/pdf/1412.6980.pdf :param learningrate learning rate :param learningrate_decay learning rate decay :param beta1 first moment coefficient :param beta2 second moment coefficient :param epsilon for numerical stability >>> adam = Adam() creating: createAdam

class bigdl.optim.optimizer.Adamax(learningrate=0.002, beta1=0.9, beta2=0.999, epsilon=1e-38, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

An implementation of Adamax http://arxiv.org/pdf/1412.6980.pdf :param learningrate learning rate :param beta1 first moment coefficient :param beta2 second moment coefficient :param epsilon for numerical stability >>> adagrad = Adamax() creating: createAdamax

class bigdl.optim.optimizer.BaseOptimizer(jvalue, bigdl_type, *args)[source]¶

Bases: bigdl.util.common.JavaValue

disable_gradclip()[source]¶: disable clipping.

optimize()[source]¶: Do an optimization.

prepare_input()[source]¶: Load input. Notebook user can call this method to seprate load data and create optimizer time

set_checkpoint(checkpoint_trigger, checkpoint_path, isOverWrite=True)[source]¶

Configure checkpoint settings.

Parameters:	checkpoint_trigger – the interval to write snapshots checkpoint_path – the path to write snapshots into isOverWrite – whether to overwrite existing snapshots in path.default is True

set_criterion(criterion)[source]¶

set new criterion, for optimizer reuse

Parameters:	criterion – new criterion
Returns:

set_end_when(end_when)[source]¶: When to stop, passed in a [[Trigger]]

set_gradclip_const(min_value, max_value)[source]¶

Configure constant clipping settings.

Parameters:	min_value – the minimum value to clip by max_value – the maxmimum value to clip by

set_gradclip_l2norm(clip_norm)[source]¶

Configure L2 norm clipping settings.

Parameters:	clip_norm – gradient L2-Norm threshold

set_model(model)[source]¶

Set model.

Parameters:	model – new model

set_train_summary(summary)[source]¶

Set train summary. A TrainSummary object contains information necessary for the optimizer to know how often the logs are recorded, where to store the logs and how to retrieve them, etc. For details, refer to the docs of TrainSummary.

Parameters:	summary – a TrainSummary object

set_val_summary(summary)[source]¶

Set validation summary. A ValidationSummary object contains information necessary for the optimizer to know how often the logs are recorded, where to store the logs and how to retrieve them, etc. For details, refer to the docs of ValidationSummary.

Parameters:	summary – a ValidationSummary object

class bigdl.optim.optimizer.Default(bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A learning rate decay policy, where the effective learning rate is calculated as base_lr * gamma ^ (floor(iter / step_size))

:param step_size :param gamma

>>> step = Default()
creating: createDefault

class bigdl.optim.optimizer.DistriOptimizer(model, training_rdd, criterion, end_trigger, batch_size, optim_method=None, bigdl_type='float')[source]¶: Bases: bigdl.optim.optimizer.Optimizer

class bigdl.optim.optimizer.EveryEpoch(bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. EveryEpoch is a trigger that triggers an action when each epoch finishs. Could be used as trigger in setvalidation and setcheckpoint in Optimizer, and also in TrainSummary.set_summary_trigger.

>>> everyEpoch = EveryEpoch()
creating: createEveryEpoch

class bigdl.optim.optimizer.Exponential(decay_step, decay_rate, stair_case=False, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

[[Exponential]] is a learning rate schedule, which rescale the learning rate by lr_{n + 1} = lr * decayRate ^ (iter / decayStep) :param decay_step the inteval for lr decay :param decay_rate decay rate :param stair_case if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.

>>> exponential = Exponential(100, 0.1)
creating: createExponential

class bigdl.optim.optimizer.Ftrl(learningrate=0.001, learningrate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, l2_shrinkage_regularization_strength=0.0, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

An implementation of Ftrl https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf. Support L1 penalty, L2 penalty and shrinkage-type L2 penalty.

:param learningrate learning rate :param learningrate_power double, must be less or equal to zero. Default is -0.5. :param initial_accumulator_value double, the starting value for accumulators, require zero or positive values. :param l1_regularization_strength double, must be greater or equal to zero. Default is zero. :param l2_regularization_strength double, must be greater or equal to zero. Default is zero. :param l2_shrinkage_regularization_strength double, must be greater or equal to zero. Default is zero. This differs from l2RegularizationStrength above. L2 above is a stabilization penalty, whereas this one is a magnitude penalty. >>> ftrl = Ftrl() creating: createFtrl >>> ftrl2 = Ftrl(1e-2, -0.1, 0.2, 0.3, 0.4, 0.5) creating: createFtrl

class bigdl.optim.optimizer.HitRatio(k=10, neg_num=100, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Hit Ratio(HR) used in recommandation application. HR intuitively measures whether the test item is present on the top-k list.

>>> hr10 = HitRatio(k = 10)
creating: createHitRatio

class bigdl.optim.optimizer.L1L2Regularizer(l1, l2, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Apply both L1 and L2 regularization

:param l1 l1 regularization rate :param l2 l2 regularization rate

class bigdl.optim.optimizer.L1Regularizer(l1, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Apply L1 regularization

:param l1 l1 regularization rate

class bigdl.optim.optimizer.L2Regularizer(l2, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Apply L2 regularization

:param l2 l2 regularization rate

class bigdl.optim.optimizer.LBFGS(max_iter=20, max_eval=1.7976931348623157e+308, tolfun=1e-05, tolx=1e-09, ncorrection=100, learningrate=1.0, verbose=False, linesearch=None, linesearch_options=None, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

This implementation of L-BFGS relies on a user-provided line search function (state.lineSearch). If this function is not provided, then a simple learningRate is used to produce fixed size steps. Fixed size steps are much less costly than line searches, and can be useful for stochastic problems. The learning rate is used even when a line search is provided. This is also useful for large-scale stochastic problems, where opfunc is a noisy approximation of f(x). In that case, the learning rate allows a reduction of confidence in the step size.

:param max_iter Maximum number of iterations allowed :param max_eval Maximum number of function evaluations :param tolfun Termination tolerance on the first-order optimality :param tolx Termination tol on progress in terms of func/param changes :param ncorrection :param learningrate :param verbose :param linesearch A line search function :param linesearch_options If no line search provided, then a fixed step size is used >>> lbfgs = LBFGS() creating: createLBFGS

class bigdl.optim.optimizer.LocalOptimizer(X, Y, model, criterion, end_trigger, batch_size, optim_method=None, cores=None, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.BaseOptimizer

Create an optimizer.

Parameters:	model – the neural net model X – the training features which is an ndarray or list of ndarray Y – the training label which is an ndarray criterion – the loss function optim_method – the algorithm to use for optimization,

e.g. SGD, Adagrad, etc. If optim_method is None, the default algorithm is SGD. :param end_trigger: when to end the optimization :param batch_size: training batch size :param cores: by default is the total physical cores.

set_validation(batch_size, X_val, Y_val, trigger, val_method=None)[source]¶

Configure validation settings.

Parameters:	batch_size – validation batch size X_val – features of validation dataset Y_val – label of validation dataset trigger – validation interval val_method – the ValidationMethod to use,e.g. “Top1Accuracy”, “Top5Accuracy”, “Loss”

class bigdl.optim.optimizer.Loss(cri=None, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

This evaluation method is calculate loss of output with respect to target >>> from bigdl.nn.criterion import ClassNLLCriterion >>> loss = Loss() creating: createClassNLLCriterion creating: createLoss

>>> loss = Loss(ClassNLLCriterion())
creating: createClassNLLCriterion
creating: createLoss

class bigdl.optim.optimizer.MAE(bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

This evaluation method calculates the mean absolute error of output with respect to target.

>>> mae = MAE()
creating: createMAE

class bigdl.optim.optimizer.MaxEpoch(max_epoch, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. MaxEpoch is a trigger that triggers an action when training reaches the number of epochs specified by “max_epoch”. Usually used as end_trigger when creating an Optimizer.

>>> maxEpoch = MaxEpoch(2)
creating: createMaxEpoch

class bigdl.optim.optimizer.MaxIteration(max, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. MaxIteration is a trigger that triggers an action when training reaches the number of iterations specified by “max”. Usually used as end_trigger when creating an Optimizer.

>>> maxIteration = MaxIteration(20)
creating: createMaxIteration

class bigdl.optim.optimizer.MaxScore(max, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A trigger that triggers an action when validation score larger than “max” score

>>> maxScore = MaxScore(0.4)
creating: createMaxScore

class bigdl.optim.optimizer.MinLoss(min, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A trigger that triggers an action when training loss less than “min” loss

>>> minLoss = MinLoss(0.1)
creating: createMinLoss

class bigdl.optim.optimizer.MultiStep(step_sizes, gamma, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

similar to step but it allows non uniform steps defined by stepSizes

Parameters:	step_size – the series of step sizes used for lr decay gamma – coefficient of decay

>>> step = MultiStep([2, 5], 0.3)
creating: createMultiStep

class bigdl.optim.optimizer.NDCG(k=10, neg_num=100, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Normalized Discounted Cumulative Gain(NDCG). NDCG accounts for the position of the hit by assigning higher scores to hits at top ranks.

>>> ndcg = NDCG(k = 10)
creating: createNDCG

class bigdl.optim.optimizer.OptimMethod(jvalue, bigdl_type, *args)[source]¶

Bases: bigdl.util.common.JavaValue

static load(path, bigdl_type='float')[source]¶: load optim method :param path: file path

save(path, overWrite)[source]¶: save OptimMethod :param path path :param overWrite whether to overwrite

class bigdl.optim.optimizer.Optimizer(model, training_rdd, criterion, end_trigger, batch_size, optim_method=None, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.BaseOptimizer

static create(model, training_set, criterion, end_trigger=None, batch_size=32, optim_method=None, cores=None, bigdl_type='float')[source]¶

Create an optimizer. Depend on the input type, the returning optimizer can be a local optimizer or a distributed optimizer.

Parameters:	model – the neural net model training_set – (features, label) for local mode. RDD[Sample] for distributed mode. criterion – the loss function optim_method – the algorithm to use for optimization,

e.g. SGD, Adagrad, etc. If optim_method is None, the default algorithm is SGD. :param end_trigger: when to end the optimization. default value is MapEpoch(1) :param batch_size: training batch size :param cores: This is for local optimizer only and use total physical cores as the default value

set_traindata(training_rdd, batch_size)[source]¶

Set new training dataset, for optimizer reuse

Parameters:	training_rdd – the training dataset batch_size – training batch size
Returns:

set_validation(batch_size, val_rdd, trigger, val_method=None)[source]¶

Configure validation settings.

Parameters:	batch_size – validation batch size val_rdd – validation dataset trigger – validation interval val_method – the ValidationMethod to use,e.g. “Top1Accuracy”, “Top5Accuracy”, “Loss”

class bigdl.optim.optimizer.ParallelAdam(learningrate=0.001, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-08, parallel_num=-1, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

An implementation of Adam http://arxiv.org/pdf/1412.6980.pdf :param learningrate learning rate :param learningrate_decay learning rate decay :param beta1 first moment coefficient :param beta2 second moment coefficient :param epsilon for numerical stability >>> init_engine() >>> pAdam = ParallelAdam() creating: createParallelAdam

class bigdl.optim.optimizer.Plateau(monitor, factor=0.1, patience=10, mode='min', epsilon=0.0001, cooldown=0, min_lr=0.0, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

:param monitor quantity to be monitored, can be Loss or score :param factor factor by which the learning rate will be reduced. new_lr = lr * factor :param patience number of epochs with no improvement after which learning rate will be reduced. :param mode one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing :param epsilon threshold for measuring the new optimum, to only focus on significant changes. :param cooldown number of epochs to wait before resuming normal operation after lr has been reduced. :param min_lr lower bound on the learning rate.

>>> plateau = Plateau("score")
creating: createPlateau

class bigdl.optim.optimizer.Poly(power, max_iteration, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration. Calculation: base_lr (1 - iter/max_iteration) ^ (power)

Parameters:	power – coeffient of decay, refer to calculation formula max_iteration – max iteration when lr becomes zero

>>> poly = Poly(0.5, 2)
creating: createPoly

class bigdl.optim.optimizer.RMSprop(learningrate=0.01, learningrate_decay=0.0, decayrate=0.99, epsilon=1e-08, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

An implementation of RMSprop :param learningrate learning rate :param learningrate_decay learning rate decay :param decayrate decay rate, also called rho :param epsilon for numerical stability >>> adagrad = RMSprop() creating: createRMSprop

class bigdl.optim.optimizer.SGD(learningrate=0.001, learningrate_decay=0.0, weightdecay=0.0, momentum=0.0, dampening=1.7976931348623157e+308, nesterov=False, leaningrate_schedule=None, learningrates=None, weightdecays=None, bigdl_type='float')[source]¶

Bases: bigdl.optim.optimizer.OptimMethod

A plain implementation of SGD

:param learningrate learning rate :param learningrate_decay learning rate decay :param weightdecay weight decay :param momentum momentum :param dampening dampening for momentum :param nesterov enables Nesterov momentum :param learningrates 1D tensor of individual learning rates :param weightdecays 1D tensor of individual weight decays >>> sgd = SGD() creating: createDefault creating: createSGD

class bigdl.optim.optimizer.SequentialSchedule(iteration_per_epoch, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Stack several learning rate schedulers.

Parameters:	iterationPerEpoch – iteration numbers per epoch

>>> sequentialSchedule = SequentialSchedule(5)
creating: createSequentialSchedule
>>> poly = Poly(0.5, 2)
creating: createPoly
>>> test = sequentialSchedule.add(poly, 5)

add(scheduler, max_iteration, bigdl_type='float')[source]¶

Add a learning rate scheduler to the contained schedules

Parameters:	scheduler – learning rate scheduler to be add max_iteration – iteration numbers this scheduler will run

class bigdl.optim.optimizer.SeveralIteration(interval, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A trigger specifies a timespot or several timespots during training, and a corresponding action will be taken when the timespot(s) is reached. SeveralIteration is a trigger that triggers an action every “n” iterations. Could be used as trigger in setvalidation and setcheckpoint in Optimizer, and also in TrainSummary.set_summary_trigger.

>>> serveralIteration = SeveralIteration(2)
creating: createSeveralIteration

class bigdl.optim.optimizer.Step(step_size, gamma, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A learning rate decay policy, where the effective learning rate is calculated as base_lr * gamma ^ (floor(iter / step_size))

Parameters:	step_size – gamma –

>>> step = Step(2, 0.3)
creating: createStep

class bigdl.optim.optimizer.Top1Accuracy(bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Caculate the percentage that output’s max probability index equals target.

>>> top1 = Top1Accuracy()
creating: createTop1Accuracy

class bigdl.optim.optimizer.Top5Accuracy(bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Caculate the percentage that output’s max probability index equals target.

>>> top5 = Top5Accuracy()
creating: createTop5Accuracy

class bigdl.optim.optimizer.TrainSummary(log_dir, app_name, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A logging facility which allows user to trace how indicators (e.g. learning rate, training loss, throughput, etc.) change with iterations/time in an optimization process. TrainSummary is for training indicators only (check ValidationSummary for validation indicators). It contains necessary information for the optimizer to know where to store the logs, how to retrieve the logs, and so on. - The logs are written in tensorflow-compatible format so that they can be visualized directly using tensorboard. Also the logs can be retrieved as ndarrays and visualized using python libraries such as matplotlib (in notebook, etc.).

Use optimizer.setTrainSummary to enable train logger.

read_scalar(tag)[source]¶

Retrieve train logs by type. Return an array of records in the format (step,value,wallClockTime). - “Step” is the iteration count by default.

Parameters:	tag – the type of the logs, Supported tags are: “LearningRate”,”Loss”, “Throughput”

set_summary_trigger(name, trigger)[source]¶

Set the interval of recording for each indicator.

Parameters:

tag – tag name. Supported tag names are “LearningRate”, “Loss”,”Throughput”, “Parameters”. “Parameters” is an umbrella tag thatincludes weight, bias, gradWeight, gradBias, and some running status(eg. runningMean and runningVar in BatchNormalization). If youdidn’t set any triggers, we will by default record Loss and Throughputin each iteration, while NOT recording LearningRate and Parameters,as recording parameters may introduce substantial overhead when themodel is very big, LearningRate is not a public attribute for allOptimMethod.
trigger – trigger

class bigdl.optim.optimizer.TreeNNAccuracy(bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

Caculate the percentage that output’s max probability index equals target.

>>> top1 = TreeNNAccuracy()
creating: createTreeNNAccuracy

class bigdl.optim.optimizer.TriggerAnd(first, *other)[source]¶

Bases: bigdl.util.common.JavaValue

A trigger contains other triggers and triggers when all of them trigger (logical AND)

>>> a = TriggerAnd(MinLoss(0.1), MaxEpoch(2))
creating: createMinLoss
creating: createMaxEpoch
creating: createTriggerAnd

class bigdl.optim.optimizer.TriggerOr(first, *other)[source]¶

Bases: bigdl.util.common.JavaValue

A trigger contains other triggers and triggers when any of them trigger (logical OR)

>>> o = TriggerOr(MinLoss(0.1), MaxEpoch(2))
creating: createMinLoss
creating: createMaxEpoch
creating: createTriggerOr

class bigdl.optim.optimizer.ValidationSummary(log_dir, app_name, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A logging facility which allows user to trace how indicators (e.g. validation loss, top1 accuray, top5 accuracy etc.) change with iterations/time in an optimization process. ValidationSummary is for validation indicators only (check TrainSummary for train indicators). It contains necessary information for the optimizer to know where to store the logs, how to retrieve the logs, and so on. - The logs are written in tensorflow-compatible format so that they can be visualized directly using tensorboard. Also the logs can be retrieved as ndarrays and visualized using python libraries such as matplotlib (in notebook, etc.).

Use optimizer.setValidationSummary to enable validation logger.

read_scalar(tag)[source]¶

Retrieve validation logs by type. Return an array of records in the format (step,value,wallClockTime). - “Step” is the iteration count by default.

Parameters:	tag – the type of the logs. The tag should match the name ofthe ValidationMethod set into the optimizer. e.g.”Top1AccuracyLoss”,”Top1Accuracy” or “Top5Accuracy”.

class bigdl.optim.optimizer.Warmup(delta, bigdl_type='float')[source]¶

Bases: bigdl.util.common.JavaValue

A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration

Parameters:	delta – increase amount after each iteration

>>> warmup = Warmup(0.05)
creating: createWarmup

Table Of Contents

Previous topic

Next topic

This Page

bigdl.optim package¶

Submodules¶

bigdl.optim.optimizer module¶

Module contents¶