It is the default learning rate schedule.
It is the default learning rate schedule. For each iteration, the learning rate would update with the following formula:
l_{n + 1} = l / (1 + n * learning_rate_decay)
where l
is the initial learning rate
It is an epoch decay learning rate schedule The learning rate decays through a function argument on number of run epochs
It is an epoch decay learning rate schedule The learning rate decays through a function argument on number of run epochs
l_{n + 1} = l_{n} * 0.1 ^
decayType(epoch)
is a function with number of run epochs as the argument
Learning rate schedule based on warm up Iterations
Learning rate schedule based on warm up Iterations
Warm up iteration number
Warm up dealta value applied to warm up iteration
A function to calculate decay on epochs
EpochSchedule is a learning rate schedule which configure the learning rate according to some pre-defined Regime.
EpochSchedule is a learning rate schedule which configure the learning
rate according to some pre-defined Regime. If the running epoch is within
the interval of a regime r
[r.startEpoch, r.endEpoch], then the learning
rate will take the "learningRate" in r.config.
an array of pre-defined Regime.
EpochStep is a learning rate schedule, which rescale the learning rate by gamma
for each stepSize
epochs.
EpochStep is a learning rate schedule, which rescale the learning rate by gamma
for each stepSize
epochs.
For how many epochs to update the learning rate once
the rescale factor
Exponential is a learning rate schedule, which rescale the learning rate by
lr_{n + 1} = lr * decayRate ^
(iter / decayStep)
Exponential is a learning rate schedule, which rescale the learning rate by
lr_{n + 1} = lr * decayRate ^
(iter / decayStep)
the inteval for lr decay
decay rate
if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.
Hyper parameter schedule for SGD
similar to step but it allows non uniform steps defined by stepSizes
similar to step but it allows non uniform steps defined by stepSizes
the series of step sizes used for lr decay
coefficient of decay
NaturalExp is a learning rate schedule, which rescale the learning rate by exp ( -decay_rate * iter / decay_step ) referring to tensorflow's learning rate decay # natural_exp_decay
NaturalExp is a learning rate schedule, which rescale the learning rate by exp ( -decay_rate * iter / decay_step ) referring to tensorflow's learning rate decay # natural_exp_decay
how often to apply decay
the decay rate. e.g. 0.96
Plateau is the learning rate schedule when a metric has stopped improving.
Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.
quantity to be monitored, can be Loss or score
factor by which the learning rate will be reduced. new_lr = lr * factor
number of epochs with no improvement after which learning rate will be reduced.
one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing
threshold for measuring the new optimum, to only focus on significant changes.
number of epochs to wait before resuming normal operation after lr has been reduced.
lower bound on the learning rate.
A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration.
A learning rate decay policy, where the effective learning rate
follows a polynomial decay, to be zero by the max_iteration.
Calculation: base_lr (1 - iter/maxIteration) ^
(power)
coeffient of decay, refer to calculation formula
max iteration when lr becomes zero
A structure to specify hyper parameters by start epoch and end epoch.
A structure to specify hyper parameters by start epoch and end epoch. Usually work with EpochSchedule.
start epoch
end epoch
config table contains hyper parameters
Stack several learning rate schedulers.
Stack several learning rate schedulers.
iteration numbers per epoch
A learning rate decay policy, where the effective learning rate
is calculated as base_lr * gamma ^
(floor(iter / stepSize))
A learning rate decay policy, where the effective learning rate
is calculated as base_lr * gamma ^
(floor(iter / stepSize))
the inteval for lr decay
coefficient of decay, refer to calculation formula
A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration.
A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration
increase amount after each iteration