Check if there is LarsSGD in optimMethods.
Check if there is LarsSGD in optimMethods. If so, return the weight decay of the first found LarsSGD. Else, return None
The weight decay of the first found LarsSGD in the optimMethods. Or None if there is not one
Create a Map(String, OptimMethod) for a container.
Create a Map(String, OptimMethod) for a container. For each submodule in the container, generate (module.getName(), new Lars[T]) pair in the returned map. The resulting map can be used in setOptimMethods. Note: each Lars optim uses the same LearningRateSchedule
the container to build LARS optim method for
the trust on the learning rate scale, should be in 0 to 1
learning rate
learning rate decay
weight decay
momentum
the learning rate scheduler
Create a Map(String, OptimMethod) for a container.
Create a Map(String, OptimMethod) for a container. For each submodule in the container, generate (module.getName(), new Lars[T]) pair in the returned map. The resulting map can be used in setOptimMethods. This function sets different LearningRateSchedules for different submodules
the container to build LARS optim method for
the learning rate schedule generator for each sub-module. Generator accepts the sub-module that the schedule is linked to. It should return a tuple (learningRateSchedule, isOwner), where isOwner indicates whether the corresponding LARS optim method is responsible for showing the learning rate in getHyperParameter (multiple LARS optim methods may share one learning rate scheduler)
the trust on the learning rate scale, should be in 0 to 1
learning rate
learning rate decay
weight decay
momentum