bigdl.nn package

Submodules

bigdl.nn.criterion module

class bigdl.nn.criterion.AbsCriterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

measures the mean absolute value of the element-wise difference between input

>>> absCriterion = AbsCriterion(True)
creating: createAbsCriterion
class bigdl.nn.criterion.BCECriterion(weights=None, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the Binary Cross Entropy between the target and the output

Parameters:
  • weights – weights for each class
  • sizeAverage – whether to average the loss or not
>>> np.random.seed(123)
>>> weights = np.random.uniform(0, 1, (2,)).astype("float32")
>>> bCECriterion = BCECriterion(weights)
creating: createBCECriterion
>>> bCECriterion = BCECriterion()
creating: createBCECriterion
class bigdl.nn.criterion.CategoricalCrossEntropy(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

This criterion is same with cross entropy criterion, except it takes a one-hot format target tensor >>> cce = CategoricalCrossEntropy() creating: createCategoricalCrossEntropy

class bigdl.nn.criterion.ClassNLLCriterion(weights=None, size_average=True, logProbAsInput=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

The negative log likelihood criterion. It is useful to train a classification problem with n classes. If provided, the optional argument weights should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

The input given through a forward() is expected to contain log-probabilities/probabilities of each class: input has to be a 1D Tensor of size n. Obtaining log-probabilities/probabilities in a neural network is easily achieved by adding a LogSoftMax/SoftMax layer in the last layer of your neural network. You may use CrossEntropyCriterion instead, if you prefer not to add an extra layer to your network. This criterion expects a class index (1 to the number of class) as target when calling forward(input, target) and backward(input, target).

In the log-probabilities case, The loss can be described as: loss(x, class) = -x[class] or in the case of the weights argument it is specified as follows: loss(x, class) = -weights[class] * x[class] Due to the behaviour of the backend code, it is necessary to set sizeAverage to false when calculating losses in non-batch mode.

Note that if the target is -1, the training process will skip this sample. In other will, the forward process will return zero output and the backward process will also return zero gradInput.

By default, the losses are averaged over observations for each minibatch. However, if the field sizeAverage is set to false, the losses are instead summed for each minibatch.

In particular, when weights=None, size_average=True and logProbAsInput=False, this is same as sparse_categorical_crossentropy loss in keras.

Parameters:
  • weights – weights of each class
  • size_average – whether to average or not
  • logProbAsInput – indicating whether to accept log-probabilities or probabilities as input.
>>> np.random.seed(123)
>>> weights = np.random.uniform(0, 1, (2,)).astype("float32")
>>> classNLLCriterion = ClassNLLCriterion(weights, True, True)
creating: createClassNLLCriterion
>>> classNLLCriterion = ClassNLLCriterion()
creating: createClassNLLCriterion
class bigdl.nn.criterion.ClassSimplexCriterion(n_classes, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

ClassSimplexCriterion implements a criterion for classification. It learns an embedding per class, where each class’ embedding is a point on an (N-1)-dimensional simplex, where N is the number of classes.

Parameters:nClasses – the number of classes.
>>> classSimplexCriterion = ClassSimplexCriterion(2)
creating: createClassSimplexCriterion
class bigdl.nn.criterion.CosineDistanceCriterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the loss given an input and target, Loss = 1 - cos(x, y)

>>> cosineDistanceCriterion = CosineDistanceCriterion(True)
creating: createCosineDistanceCriterion
>>> cosineDistanceCriterion.forward(np.array([1.0, 2.0, 3.0, 4.0, 5.0]),
...                                   np.array([5.0, 4.0, 3.0, 2.0, 1.0]))
0.07272728
class bigdl.nn.criterion.CosineEmbeddingCriterion(margin=0.0, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the loss given an input x = {x1, x2}, a table of two Tensors, and a Tensor label y with values 1 or -1.

Parameters:margin – a number from -1 to 1, 0 to 0.5 is suggested
>>> cosineEmbeddingCriterion = CosineEmbeddingCriterion(1e-5, True)
creating: createCosineEmbeddingCriterion
>>> cosineEmbeddingCriterion.forward([np.array([1.0, 2.0, 3.0, 4.0, 5.0]),
...                                   np.array([5.0, 4.0, 3.0, 2.0, 1.0])],
...                                 [np.ones(5)])
0.0
class bigdl.nn.criterion.CosineProximityCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

compute the negative of the mean cosine proximity between predictions and targets.

x'(i) = x(i) / sqrt(max(sum(x(i)^2), 1e-12))
y'(i) = y(i) / sqrt(max(sum(x(i)^2), 1e-12))
cosine_proximity(x, y) = sum_i(-1 * x'(i) * y'(i))
>>> cosineProximityCriterion = CosineProximityCriterion()
creating: createCosineProximityCriterion
class bigdl.nn.criterion.Criterion(jvalue, bigdl_type, *args)[source]

Bases: bigdl.util.common.JavaValue

Criterion is helpful to train a neural network. Given an input and a target, they compute a gradient according to a given loss function.

backward(input, target)[source]

NB: It’s for debug only, please use optimizer.optimize() in production. Performs a back-propagation step through the criterion, with respect to the given input.

Parameters:
  • input – ndarray or list of ndarray
  • target – ndarray or list of ndarray
Returns:

ndarray

forward(input, target)[source]

NB: It’s for debug only, please use optimizer.optimize() in production. Takes an input object, and computes the corresponding loss of the criterion, compared with target

Parameters:
  • input – ndarray or list of ndarray
  • target – ndarray or list of ndarray
Returns:

value of loss

classmethod of(jcriterion, bigdl_type='float')[source]

Create a python Criterion by a java criterion object

Parameters:jcriterion – A java criterion object which created by Py4j
Returns:a criterion.
class bigdl.nn.criterion.CrossEntropyCriterion(weights=None, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

This criterion combines LogSoftMax and ClassNLLCriterion in one single class.

Parameters:weights – A tensor assigning weight to each of the classes
>>> np.random.seed(123)
>>> weights = np.random.uniform(0, 1, (2,)).astype("float32")
>>> cec = CrossEntropyCriterion(weights)
creating: createCrossEntropyCriterion
>>> cec = CrossEntropyCriterion()
creating: createCrossEntropyCriterion
class bigdl.nn.criterion.DiceCoefficientCriterion(size_average=True, epsilon=1.0, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

The Dice-Coefficient criterion input: Tensor,target: Tensor

return:      2 * (input intersection target)
        1 - ----------------------------------
                input union target
>>> diceCoefficientCriterion = DiceCoefficientCriterion(size_average = True, epsilon = 1.0)
creating: createDiceCoefficientCriterion
>>> diceCoefficientCriterion = DiceCoefficientCriterion()
creating: createDiceCoefficientCriterion
class bigdl.nn.criterion.DistKLDivCriterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

The Kullback-Leibler divergence criterion

Parameters:sizeAverage
>>> distKLDivCriterion = DistKLDivCriterion(True)
creating: createDistKLDivCriterion
class bigdl.nn.criterion.DotProductCriterion(size_average=False, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Compute the dot product of input and target tensor. Input and target are required to have the same size. :param size_average: whether to average over each observations in the same batch

>>> dp =DotProductCriterion(False)
creating: createDotProductCriterion
class bigdl.nn.criterion.GaussianCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Computes the log-likelihood of a sample x given a Gaussian distribution p. >>> GaussianCriterion = GaussianCriterion() creating: createGaussianCriterion

class bigdl.nn.criterion.HingeEmbeddingCriterion(margin=1.0, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the loss given an input x which is a 1-dimensional vector and a label y (1 or -1). This is usually used for measuring whether two inputs are similar or dissimilar, e.g. using the L1 pairwise distance, and is typically used for learning nonlinear embeddings or semi-supervised learning.

If x and y are n-dimensional Tensors, the sum operation still operates over all the elements, and divides by n (this can be avoided if one sets the internal variable sizeAverage to false). The margin has a default value of 1, or can be set in the constructor.

>>> hingeEmbeddingCriterion = HingeEmbeddingCriterion(1e-5, True)
creating: createHingeEmbeddingCriterion
class bigdl.nn.criterion.KLDCriterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Computes the KL-divergence of the input normal distribution to a standard normal distribution. The input has to be a table. The first element of input is the mean of the distribution, the second element of input is the log_variance of the distribution. The input distribution is assumed to be diagonal. >>> KLDCriterion = KLDCriterion(True) creating: createKLDCriterion

class bigdl.nn.criterion.KullbackLeiblerDivergenceCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

compute Kullback Leibler DivergenceCriterion error for intput and target This method is same as kullback_leibler_divergence loss in keras. Loss calculated as: y_true = K.clip(input, K.epsilon(), 1) y_pred = K.clip(target, K.epsilon(), 1) and output K.sum(y_true * K.log(y_true / y_pred), axis=-1)

>>> error = KullbackLeiblerDivergenceCriterion()
creating: createKullbackLeiblerDivergenceCriterion
class bigdl.nn.criterion.L1Cost(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

compute L1 norm for input, and sign of input

>>> l1Cost = L1Cost()
creating: createL1Cost
class bigdl.nn.criterion.L1HingeEmbeddingCriterion(margin=1.0, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the loss given an input x = {x1, x2}, a table of two Tensors, and a label y (1 or -1):

Parameters:margin
>>> l1HingeEmbeddingCriterion = L1HingeEmbeddingCriterion(1e-5)
creating: createL1HingeEmbeddingCriterion
>>> l1HingeEmbeddingCriterion = L1HingeEmbeddingCriterion()
creating: createL1HingeEmbeddingCriterion
>>> input1 = np.array([2.1, -2.2])
>>> input2 = np.array([-0.55, 0.298])
>>> input = [input1, input2]
>>> target = np.array([1.0])
>>> result = l1HingeEmbeddingCriterion.forward(input, target)
>>> (result == 5.148)
True
class bigdl.nn.criterion.MSECriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the mean squared error between n elements in the input x and output y:

loss(x, y) = 1/n \sum |x_i - y_i|^2

If x and y are d-dimensional Tensors with a total of n elements, the sum operation still operates over all the elements, and divides by n. The two Tensors must have the same number of elements (but their sizes might be different). The division by n can be avoided if one sets the internal variable sizeAverage to false. By default, the losses are averaged over observations for each minibatch. However, if the field sizeAverage is set to false, the losses are instead summed.

>>> mSECriterion = MSECriterion()
creating: createMSECriterion
class bigdl.nn.criterion.MarginCriterion(margin=1.0, size_average=True, squared=False, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that optimizes a two-class classification hinge loss (margin-based loss) between input x (a Tensor of dimension 1) and output y.

When margin = 1, size_average = True and squared = False, this is the same as hinge loss in keras; When margin = 1, size_average = False and squared = True, this is the same as squared_hinge loss in keras.

Parameters:
  • margin – if unspecified, is by default 1.
  • size_average – size average in a mini-batch
  • squared – whether to calculate the squared hinge loss
>>> marginCriterion = MarginCriterion(1e-5, True, False)
creating: createMarginCriterion
class bigdl.nn.criterion.MarginRankingCriterion(margin=1.0, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that measures the loss given an input x = {x1, x2}, a table of two Tensors of size 1 (they contain only scalars), and a label y (1 or -1). In batch mode, x is a table of two Tensors of size batchsize, and y is a Tensor of size batchsize containing 1 or -1 for each corresponding pair of elements in the input Tensor. If y == 1 then it assumed the first input should be ranked higher (have a larger value) than the second input, and vice-versa for y == -1.

Parameters:margin
>>> marginRankingCriterion = MarginRankingCriterion(1e-5, True)
creating: createMarginRankingCriterion
class bigdl.nn.criterion.MeanAbsolutePercentageCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

This method is same as mean_absolute_percentage_error loss in keras. It caculates diff = K.abs((y - x) / K.clip(K.abs(y), K.epsilon(), Double.MaxValue)) and return 100 * K.mean(diff) as output. Here, the x and y can have or not have a batch. >>> error = MeanAbsolutePercentageCriterion() creating: createMeanAbsolutePercentageCriterion

class bigdl.nn.criterion.MeanSquaredLogarithmicCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

This method is same as mean_squared_logarithmic_error loss in keras. It calculates: first_log = K.log(K.clip(y, K.epsilon(), Double.MaxValue) + 1.) second_log = K.log(K.clip(x, K.epsilon(), Double.MaxValue) + 1.) and output K.mean(K.square(first_log - second_log)). Here, the x and y can have or not have a batch. >>> error = MeanSquaredLogarithmicCriterion() creating: createMeanSquaredLogarithmicCriterion

class bigdl.nn.criterion.MultiCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

a weighted sum of other criterions each applied to the same input and target

>>> multiCriterion = MultiCriterion()
creating: createMultiCriterion
>>> mSECriterion = MSECriterion()
creating: createMSECriterion
>>> multiCriterion = multiCriterion.add(mSECriterion)
>>> multiCriterion = multiCriterion.add(mSECriterion)
add(criterion, weight=1.0)[source]
class bigdl.nn.criterion.MultiLabelMarginCriterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that optimizes a multi-class multi-classification hinge loss ( margin-based loss) between input x and output y (which is a Tensor of target class indices)

Parameters:size_average – size average in a mini-batch
>>> multiLabelMarginCriterion = MultiLabelMarginCriterion(True)
creating: createMultiLabelMarginCriterion
class bigdl.nn.criterion.MultiLabelSoftMarginCriterion(weights=None, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

A MultiLabel multiclass criterion based on sigmoid: the loss is:

l(x,y) = - sum_i y[i] * log(p[i]) + (1 - y[i]) * log (1 - p[i])

where p[i] = exp(x[i]) / (1 + exp(x[i])) and with weights:

l(x,y) = - sum_i weights[i] (y[i] * log(p[i]) + (1 - y[i]) * log (1 - p[i]))
>>> np.random.seed(123)
>>> weights = np.random.uniform(0, 1, (2,)).astype("float32")
>>> multiLabelSoftMarginCriterion = MultiLabelSoftMarginCriterion(weights)
creating: createMultiLabelSoftMarginCriterion
>>> multiLabelSoftMarginCriterion = MultiLabelSoftMarginCriterion()
creating: createMultiLabelSoftMarginCriterion
class bigdl.nn.criterion.MultiMarginCriterion(p=1, weights=None, margin=1.0, size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input x and output y (which is a target class index).

Parameters:
  • p
  • weights
  • margin
  • size_average
>>> np.random.seed(123)
>>> weights = np.random.uniform(0, 1, (2,)).astype("float32")
>>> multiMarginCriterion = MultiMarginCriterion(1,weights)
creating: createMultiMarginCriterion
>>> multiMarginCriterion = MultiMarginCriterion()
creating: createMultiMarginCriterion
class bigdl.nn.criterion.PGCriterion(sizeAverage=False, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

The Criterion to compute the negative policy gradient given a multinomial distribution and the sampled action and reward.

The input to this criterion should be a 2-D tensor representing a batch of multinomial distribution, the target should also be a 2-D tensor with the same size of input, representing the sampled action and reward/advantage with the index of non-zero element in the vector represents the sampled action and the non-zero element itself represents the reward. If the action is space is large, you should consider using SparseTensor for target.

The loss computed is simple the standard policy gradient,

loss = - 1/n * sum(R_{n} dot_product log(P_{n}))

where R_{n} is the reward vector, and P_{n} is the input distribution.

:param sizeAverage whether to average over each observations in the same batch

>>> pg = PGCriterion()
creating: createPGCriterion
class bigdl.nn.criterion.ParallelCriterion(repeat_target=False, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

ParallelCriterion is a weighted sum of other criterions each applied to a different input and target. Set repeatTarget = true to share the target for criterions.

Use add(criterion[, weight]) method to add criterion. Where weight is a scalar(default 1).

Parameters:repeat_target – Whether to share the target for all criterions.
>>> parallelCriterion = ParallelCriterion(True)
creating: createParallelCriterion
>>> mSECriterion = MSECriterion()
creating: createMSECriterion
>>> parallelCriterion = parallelCriterion.add(mSECriterion)
>>> parallelCriterion = parallelCriterion.add(mSECriterion)
add(criterion, weight=1.0)[source]
class bigdl.nn.criterion.PoissonCriterion(bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

compute Poisson error for input and target, loss calculated as: mean(input - target * K.log(input + K.epsilon()), axis=-1) >>> error = PoissonCriterion() creating: createPoissonCriterion

class bigdl.nn.criterion.SmoothL1Criterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that can be thought of as a smooth version of the AbsCriterion. It uses a squared term if the absolute element-wise error falls below 1. It is less sensitive to outliers than the MSECriterion and in some cases prevents exploding gradients (e.g. see “Fast R-CNN” paper by Ross Girshick).

                      | 0.5 * (x_i - y_i)^2^, if |x_i - y_i| < 1
loss(x, y) = 1/n \sum |
                      | |x_i - y_i| - 0.5,   otherwise

If x and y are d-dimensional Tensors with a total of n elements, the sum operation still operates over all the elements, and divides by n. The division by n can be avoided if one sets the internal variable sizeAverage to false

Parameters:size_average – whether to average the loss
>>> smoothL1Criterion = SmoothL1Criterion(True)
creating: createSmoothL1Criterion
class bigdl.nn.criterion.SmoothL1CriterionWithWeights(sigma, num=0, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

a smooth version of the AbsCriterion It uses a squared term if the absolute element-wise error falls below 1. It is less sensitive to outliers than the MSECriterion and in some cases prevents exploding gradients (e.g. see “Fast R-CNN” paper by Ross Girshick).

d = (x - y) * w_in
loss(x, y, w_in, w_out)
           | 0.5 * (sigma * d_i)^2 * w_out          if |d_i| < 1 / sigma / sigma
= 1/n \sum |
           | (|d_i| - 0.5 / sigma / sigma) * w_out   otherwise
>>> smoothL1CriterionWithWeights = SmoothL1CriterionWithWeights(1e-5, 1)
creating: createSmoothL1CriterionWithWeights
class bigdl.nn.criterion.SoftMarginCriterion(size_average=True, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Creates a criterion that optimizes a two-class classification logistic loss between input x (a Tensor of dimension 1) and output y (which is a tensor containing either 1s or -1s).

loss(x, y) = sum_i (log(1 + exp(-y[i]*x[i]))) / x:nElement()
Parameters:sizeaverage – The normalization by the number of elements in the inputcan be disabled by setting
>>> softMarginCriterion = SoftMarginCriterion(False)
creating: createSoftMarginCriterion
>>> softMarginCriterion = SoftMarginCriterion()
creating: createSoftMarginCriterion
class bigdl.nn.criterion.SoftmaxWithCriterion(ignore_label=None, normalize_mode='VALID', bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

Computes the multinomial logistic loss for a one-of-many classification task, passing real-valued predictions through a softmax to get a probability distribution over classes. It should be preferred over separate SoftmaxLayer + MultinomialLogisticLossLayer as its gradient computation is more numerically stable.

Parameters:
  • ignoreLabel – (optional) Specify a label value thatshould be ignored when computing the loss.
  • normalizeMode – How to normalize the output loss.
>>> softmaxWithCriterion = SoftmaxWithCriterion()
creating: createSoftmaxWithCriterion
>>> softmaxWithCriterion = SoftmaxWithCriterion(1, "FULL")
creating: createSoftmaxWithCriterion
class bigdl.nn.criterion.TimeDistributedCriterion(criterion, size_average=False, dimension=2, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

This class is intended to support inputs with 3 or more dimensions. Apply Any Provided Criterion to every temporal slice of an input.

Parameters:
  • criterion – embedded criterion
  • size_average – whether to divide the sequence length
>>> td = TimeDistributedCriterion(ClassNLLCriterion())
creating: createClassNLLCriterion
creating: createTimeDistributedCriterion
class bigdl.nn.criterion.TimeDistributedMaskCriterion(criterion, padding_value=0, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

This class is intended to support inputs with 3 or more dimensions. Apply Any Provided Criterion to every temporal slice of an input. In addition, it supports padding mask.

eg. if the target is [ [-1, 1, 2, 3, -1], [5, 4, 3, -1, -1] ], and set the paddingValue property to -1, then the loss of -1 would not be accumulated and the loss is only divided by 6 (ont including the amount of -1, in this case, we are only interested in 1, 2, 3, 5, 4, 3)

Parameters:
  • criterion – embedded criterion
  • padding_value – padding value
>>> td = TimeDistributedMaskCriterion(ClassNLLCriterion())
creating: createClassNLLCriterion
creating: createTimeDistributedMaskCriterion
class bigdl.nn.criterion.TransformerCriterion(criterion, input_transformer=None, target_transformer=None, bigdl_type='float')[source]

Bases: bigdl.nn.criterion.Criterion

The criterion that takes two modules to transform input and target, and take one criterion to compute the loss with the transformed input and target.

This criterion can be used to construct complex criterion. For example, the inputTransformer and targetTransformer can be pre-trained CNN networks, and we can use the networks’ output to compute the high-level feature reconstruction loss, which is commonly used in areas like neural style transfer (https://arxiv.org/abs/1508.06576), texture synthesis (https://arxiv.org/abs/1505.07376), .etc.

>>> trans = TransformerCriterion(MSECriterion())
creating: createMSECriterion
creating: createTransformerCriterion

bigdl.nn.initialization_method module

class bigdl.nn.initialization_method.BilinearFiller(bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Initialize the weight with coefficients for bilinear interpolation.

A common use case is with the DeconvolutionLayer acting as upsampling. The variable tensor passed in the init function should have 5 dimensions of format [nGroup, nInput, nOutput, kH, kW], and kH should be equal to kW

class bigdl.nn.initialization_method.ConstInitMethod(value, bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Initializer that generates tensors with certain constant double.

class bigdl.nn.initialization_method.InitializationMethod(jvalue, bigdl_type, *args)[source]

Bases: bigdl.util.common.JavaValue

Initialization method to initialize bias and weight. The init method will be called in Module.reset()

class bigdl.nn.initialization_method.MsraFiller(varianceNormAverage=True, bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

MsraFiller Initializer. See https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf

class bigdl.nn.initialization_method.Ones(bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Initializer that generates tensors with ones.

class bigdl.nn.initialization_method.RandomNormal(mean, stdv, bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Initializer that generates tensors with a normal distribution.

class bigdl.nn.initialization_method.RandomUniform(upper=None, lower=None, bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Initializer that generates tensors with a uniform distribution. It draws samples from a uniform distribution within [lower, upper] If lower and upper is not specified, it draws samples form a uniform distribution within [-limit, limit] where “limit” is “1/sqrt(fan_in)”

class bigdl.nn.initialization_method.Xavier(bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Xavier Initializer. See http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

class bigdl.nn.initialization_method.Zeros(bigdl_type='float')[source]

Bases: bigdl.nn.initialization_method.InitializationMethod

Initializer that generates tensors with zeros.

bigdl.nn.layer module

class bigdl.nn.layer.Abs(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

an element-wise abs operation

>>> abs = Abs()
creating: createAbs
class bigdl.nn.layer.ActivityRegularization(l1=0.0, l2=0.0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Layer that applies an update to the cost function based input activity.

Parameters:
  • l1 – L1 regularization factor (positive float).
  • l2 – L2 regularization factor (positive float).
>>> ar = ActivityRegularization(0.1, 0.02)
creating: createActivityRegularization
class bigdl.nn.layer.Add(input_size, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

adds a bias term to input data ;

Parameters:input_size – size of input data
>>> add = Add(1)
creating: createAdd
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.AddConstant(constant_scalar, inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

adding a constant

Parameters:
  • constant_scalar – constant value
  • inplace – Can optionally do its operation in-place without using extra state memory
>>> addConstant = AddConstant(1e-5, True)
creating: createAddConstant
class bigdl.nn.layer.Attention(hidden_size, num_heads, attention_dropout, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Implementation of multiheaded attention and self-attention layers.

>>> attention = Attention(8, 4, 1.0)
creating: createAttention
class bigdl.nn.layer.BatchNormalization(n_output, eps=1e-05, momentum=0.1, affine=True, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This layer implements Batch Normalization as described in the paper: “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Sergey Ioffe, Christian Szegedy https://arxiv.org/abs/1502.03167

This implementation is useful for inputs NOT coming from convolution layers. For convolution layers, use nn.SpatialBatchNormalization.

The operation implemented is:

       ( x - mean(x) )
y = -------------------- * gamma + beta
    standard-deviation(x)

where gamma and beta are learnable parameters.The learning of gamma and beta is optional.

Parameters:
  • n_output – output feature map number
  • eps – avoid divide zero
  • momentum – momentum for weight update
  • affine – affine operation on output or not
>>> batchNormalization = BatchNormalization(1, 1e-5, 1e-5, True)
creating: createBatchNormalization
>>> import numpy as np
>>> init_weight = np.random.randn(2)
>>> init_grad_weight = np.zeros([2])
>>> init_bias = np.zeros([2])
>>> init_grad_bias = np.zeros([2])
>>> batchNormalization = BatchNormalization(2, 1e-5, 1e-5, True, init_weight, init_bias, init_grad_weight, init_grad_bias)
creating: createBatchNormalization
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.BiRecurrent(merge=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

Create a Bidirectional recurrent layer

Parameters:merge – merge layer
>>> biRecurrent = BiRecurrent(CAddTable())
creating: createCAddTable
creating: createBiRecurrent
>>> biRecurrent = BiRecurrent()
creating: createBiRecurrent
class bigdl.nn.layer.BifurcateSplitTable(dimension, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Creates a module that takes a Tensor as input and outputs two tables, splitting the Tensor along the specified dimension dimension.

The input to this layer is expected to be a tensor, or a batch of tensors;

:param dimension to be split along this dimension :param T Numeric type. Only support float/double now

>>> bifurcateSplitTable = BifurcateSplitTable(1)
creating: createBifurcateSplitTable
class bigdl.nn.layer.Bilinear(input_size1, input_size2, output_size, bias_res=True, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

a bilinear transformation with sparse inputs, The input tensor given in forward(input) is a table containing both inputs x_1 and x_2, which are tensors of size N x inputDimension1 and N x inputDimension2, respectively.

:param input_size1 input dimension of x_1 :param input_size2 input dimension of x_2 :param output_size output dimension :param bias_res whether use bias :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias.

>>> bilinear = Bilinear(1, 1, 1, True, L1Regularizer(0.5))
creating: createL1Regularizer
creating: createBilinear
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.BinaryThreshold(th=1e-06, ip=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Binary threshold, 1 if value > th, 0 otherwise >>> layer = BinaryThreshold(0.1, False) creating: createBinaryThreshold

class bigdl.nn.layer.BinaryTreeLSTM(input_size, hidden_size, gate_output=True, with_graph=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This class is an implementation of Binary TreeLSTM (Constituency Tree LSTM). :param inputSize input units size :param hiddenSize hidden units size :param gateOutput whether gate output :param withGraph whether create lstms with [[Graph]], the default value is true. >>> treeLSTM = BinaryTreeLSTM(100, 200) creating: createBinaryTreeLSTM

class bigdl.nn.layer.Bottle(module, n_input_dim=2, n_output_dim1=2147483647, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

Bottle allows varying dimensionality input to be forwarded through any module that accepts input of nInputDim dimensions, and generates output of nOutputDim dimensions.

Parameters:
  • module – transform module
  • n_input_dim – nInputDim dimensions of module
  • n_output_dim1 – output of nOutputDim dimensions
>>> bottle = Bottle(Linear(100,10), 1, 1)
creating: createLinear
creating: createBottle
class bigdl.nn.layer.CAdd(size, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This layer has a bias tensor with given size. The bias will be added element wise to the input tensor. If the element number of the bias tensor match the input tensor, a simply element wise will be done. Or the bias will be expanded to the same size of the input. The expand means repeat on unmatched singleton dimension(if some unmatched dimension isn’t singleton dimension, it will report an error). If the input is a batch, a singleton dimension will be add to the first dimension before the expand.

Parameters:
  • size – the size of the bias
  • bRegularizer – instance of [[Regularizer]]applied to the bias.
>>> cAdd = CAdd([1,2])
creating: createCAdd
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.CAddTable(inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Merge the input tensors in the input table by element wise adding them together. The input table is actually an array of tensor with same size.

Parameters:inplace – reuse the input memory
>>> cAddTable = CAddTable(True)
creating: createCAddTable
class bigdl.nn.layer.CAveTable(inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Merge the input tensors in the input table by element wise taking the average. The input table is actually an array of tensor with same size.

Parameters:inplace – reuse the input memory
>>> cAveTable = CAveTable(True)
creating: createCAveTable
class bigdl.nn.layer.CDivTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Takes a table with two Tensor and returns the component-wise division between them.

>>> cDivTable = CDivTable()
creating: createCDivTable
class bigdl.nn.layer.CMaxTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Takes a table of Tensors and outputs the max of all of them.

>>> cMaxTable = CMaxTable()
creating: createCMaxTable
class bigdl.nn.layer.CMinTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Takes a table of Tensors and outputs the min of all of them.

>>> cMinTable = CMinTable()
creating: createCMinTable
class bigdl.nn.layer.CMul(size, wRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a component-wise multiplication to the incoming data

Parameters:size – size of the data
>>> cMul = CMul([1,2])
creating: createCMul
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.CMulTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Takes a table of Tensors and outputs the multiplication of all of them.

>>> cMulTable = CMulTable()
creating: createCMulTable
class bigdl.nn.layer.CSubTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Takes a table with two Tensor and returns the component-wise subtraction between them.

>>> cSubTable = CSubTable()
creating: createCSubTable
class bigdl.nn.layer.Clamp(min, max, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Clamps all elements into the range [min_value, max_value]. Output is identical to input in the range, otherwise elements less than min_value (or greater than max_value) are saturated to min_value (or max_value).

Parameters:
  • min
  • max
>>> clamp = Clamp(1, 3)
creating: createClamp
class bigdl.nn.layer.Concat(dimension, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

Concat concatenates the output of one layer of “parallel” modules along the provided {@code dimension}: they take the same inputs, and their output is concatenated.

                +-----------+
           +---->  module1  -----+
           |    |           |    |
input -----+---->  module2  -----+----> output
           |    |           |    |
           +---->  module3  -----+
                +-----------+
Parameters:dimension – dimension
>>> concat = Concat(2)
creating: createConcat
class bigdl.nn.layer.ConcatTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

ConcateTable is a container module like Concate. Applies an input to each member module, input can be a tensor or a table.

ConcateTable usually works with CAddTable and CMulTable to implement element wise add/multiply on outputs of two modules.

>>> concatTable = ConcatTable()
creating: createConcatTable
class bigdl.nn.layer.Container(jvalue, bigdl_type, *args)[source]

Bases: bigdl.nn.layer.Layer

[[Container]] is a sub-class of Model that declares methods defined in all containers. A container usually contain some other modules which can be added through the “add” method

add(model)[source]
flattened_layers(include_container=False)[source]
layers
class bigdl.nn.layer.Contiguous(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

used to make input, grad_output both contiguous

>>> contiguous = Contiguous()
creating: createContiguous
class bigdl.nn.layer.ConvLSTMPeephole(input_size, output_size, kernel_i, kernel_c, stride=1, padding=-1, activation=None, inner_activation=None, wRegularizer=None, uRegularizer=None, bRegularizer=None, cRegularizer=None, with_peephole=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Convolution Long Short Term Memory architecture with peephole.
Ref. A.: https://arxiv.org/abs/1506.04214 (blueprint for this module)
Parameters:
  • input_size – number of input planes in the image given into forward()
  • output_size – number of output planes the convolution layer will produce
  • kernel_i – Convolutional filter size to convolve input
  • kernel_c – Convolutional filter size to convolve cell
  • stride – The step of the convolution, default is 1
  • padding – The additional zeros added, default is -1
  • activation – activation function, by default to be Tanh if not specified.

It can also be the name of an existing activation as a string. :param inner_activation: activation function for the inner cells, by default to be Sigmoid if not specified. It can also be the name of an existing activation as a string. :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices :param uRegularizer: instance [[Regularizer]](eg. L1 or L2 regularization), applied to the recurrent weights matrices :param bRegularizer: instance of [[Regularizer]]applied to the bias. :param cRegularizer: instance of [[Regularizer]]applied to peephole. :param with_peephole: whether use last cell status control a gate.

>>> convlstm = ConvLSTMPeephole(4, 3, 3, 3, 1, -1, Tanh(), HardSigmoid(), L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createTanh
creating: createHardSigmoid
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createConvLSTMPeephole
class bigdl.nn.layer.ConvLSTMPeephole3D(input_size, output_size, kernel_i, kernel_c, stride=1, padding=-1, wRegularizer=None, uRegularizer=None, bRegularizer=None, cRegularizer=None, with_peephole=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Parameters:
  • input_size – number of input planes in the image given into forward()
  • output_size – number of output planes the convolution layer will produce

:param kernel_i Convolutional filter size to convolve input :param kernel_c Convolutional filter size to convolve cell :param stride The step of the convolution :param padding The additional zeros added :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices :param uRegularizer: instance [[Regularizer]](eg. L1 or L2 regularization), applied to the recurrent weights matrices :param bRegularizer: instance of [[Regularizer]]applied to the bias. :param cRegularizer: instance of [[Regularizer]]applied to peephole. :param with_peephole: whether use last cell status control a gate.

>>> convlstm = ConvLSTMPeephole3D(4, 3, 3, 3, 1, -1, L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createConvLSTMPeephole3D
class bigdl.nn.layer.Cosine(input_size, output_size, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Cosine calculates the cosine similarity of the input to k mean centers. The input given in forward(input) must be either a vector (1D tensor) or matrix (2D tensor). If the input is a vector, it must have the size of inputSize. If it is a matrix, then each row is assumed to be an input sample of given batch (the number of rows means the batch size and the number of columns should be equal to the inputSize).

Parameters:
  • input_size – the size of each input sample
  • output_size – the size of the module output of each sample
>>> cosine = Cosine(2,3)
creating: createCosine
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.CosineDistance(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Outputs the cosine distance between inputs

>>> cosineDistance = CosineDistance()
creating: createCosineDistance
class bigdl.nn.layer.Cropping2D(heightCrop, widthCrop, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. width and height.

# Input shape 4D tensor with shape: (batchSize, channels, first_axis_to_crop, second_axis_to_crop)

# Output shape 4D tensor with shape: (batchSize, channels, first_cropped_axis, second_cropped_axis)

:param heightCrop Array of length 2. How many units should be trimmed off at the beginning and end of the height dimension. :param widthCrop Array of length 2. How many units should be trimmed off at the beginning and end of the width dimension. :param data_format a string value (or DataFormat Object in Scala) of “NHWC” or “NCHW” to specify the input data format of this layer. In “NHWC” format data is stored in the order of [batch_size, height, width, channels], in “NCHW” format data is stored in the order of [batch_size, channels, height, width]. >>> cropping2D = Cropping2D([1, 1], [2, 2]) creating: createCropping2D

class bigdl.nn.layer.Cropping3D(dim1Crop, dim2Crop, dim3Crop, data_format='channel_first', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Cropping layer for 3D data (e.g. spatial or spatio-temporal).

# Input shape 5D tensor with shape: (batchSize, channels, first_axis_to_crop, second_axis_to_crop, third_axis_to_crop)

# Output shape 5D tensor with shape: (batchSize, channels, first_cropped_axis, second_cropped_axis, third_cropped_axis)

:param dim1Crop Array of length 2. How many units should be trimmed off at the beginning and end of the first dimension. :param dim2Crop Array of length 2. How many units should be trimmed off at the beginning and end of the second dimension. :param dim3Crop Array of length 2. How many units should be trimmed off at the beginning and end of the third dimension. :param data_format a string value. “channel_first” or “channel_last” >>> cropping3D = Cropping3D([1, 1], [2, 2], [1, 1]) creating: createCropping3D

class bigdl.nn.layer.CrossProduct(numTensor=0, embeddingSize=0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

A layer which takes a table of multiple tensors(n >= 2) as input and calculate to dot product for all combinations of pairs among input tensors.

Dot-product outputs are ordered according to orders of pairs in input Table. For instance, input (Table) is T(A, B, C), output (Tensor) will be [A.*B, A.*C, B.*C].

Dimensions of input’ Tensors could be one or two, if two, first dimension is batchSize. For convenience, output is 2-dim Tensor regardless of input’ dims.

Table size checking and Tensor size checking will be execute before each forward, when [[numTensor]] and [[embeddingSize]] are set values greater than zero.

:param numTensor (for checking)number of Tensor input Table contains, :default 0(won’t check) :param embeddingSize (for checking)vector length of dot product, :default 0(won’t check)

>>> crossProduct = CrossProduct()
creating: createCrossProduct
class bigdl.nn.layer.DenseToSparse(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Convert DenseTensor to SparseTensor.

>>> DenseToSparse = DenseToSparse()
creating: createDenseToSparse
class bigdl.nn.layer.DetectionOutputFrcnn(n_classes, bbox_vote, nms_thresh=0.3, max_per_image=100, thresh=0.05, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Post process Faster-RCNN models :param nms_thresh nms threshold :param n_classes number of classes :param bbox_vote whether to vote for detections :param max_per_image limit max number of detections per image :param thresh score threshold >>> layer = DetectionOutputFrcnn(21, True) creating: createDetectionOutputFrcnn

class bigdl.nn.layer.DetectionOutputSSD(n_classes=21, share_location=True, bg_label=0, nms_thresh=0.45, nms_topk=400, keep_top_k=200, conf_thresh=0.01, variance_encoded_in_target=False, conf_post_process=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Layer to Post-process SSD output :param n_classes number of classes :param share_location whether to share location, default is true :param bg_label background label :param nms_thresh nms threshold :param nms_topk nms topk :param keep_top_k result topk :param conf_thresh confidence threshold :param variance_encoded_in_target if variance is encoded in target, we simply need to retore the offset predictions, else if variance is encoded in bbox, we need to scale the offset accordingly. :param conf_post_process whether add some additional post process to confidence prediction >>> layer = DetectionOutputSSD() creating: createDetectionOutputSSD

class bigdl.nn.layer.DotProduct(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This is a simple table layer which takes a table of two tensors as input and calculate the dot product between them as outputs

>>> dotProduct = DotProduct()
creating: createDotProduct
class bigdl.nn.layer.Dropout(init_p=0.5, inplace=False, scale=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Dropout masks(set to zero) parts of input using a bernoulli distribution. Each input element has a probability initP of being dropped. If scale is set, the outputs are scaled by a factor of 1/(1-initP) during training. During evaluating, output is the same as input.

Parameters:
  • initP – probability to be dropped
  • inplace – inplace model
  • scale – if scale by a factor of 1/(1-initP)
>>> dropout = Dropout(0.4)
creating: createDropout
class bigdl.nn.layer.ELU(alpha=1.0, inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

D-A Clevert, Thomas Unterthiner, Sepp Hochreiter Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) [http://arxiv.org/pdf/1511.07289.pdf]

>>> eLU = ELU(1e-5, True)
creating: createELU
class bigdl.nn.layer.Echo(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This module is for debug purpose, which can print activation and gradient in your model topology

>>> echo = Echo()
creating: createEcho
class bigdl.nn.layer.Euclidean(input_size, output_size, fast_backward=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Outputs the Euclidean distance of the input to outputSize centers

Parameters:
  • inputSize – inputSize
  • outputSize – outputSize
  • T – Numeric type. Only support float/double now
>>> euclidean = Euclidean(1, 1, True)
creating: createEuclidean
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.Exp(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies element-wise exp to input tensor.

>>> exp = Exp()
creating: createExp
class bigdl.nn.layer.ExpandSize(sizes, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Expand tensor to configured size

>>> expand = ExpandSize([2, 3, 4])
creating: createExpandSize
class bigdl.nn.layer.FPN(in_channels_list, out_channels, top_blocks=0, in_channels_of_p6p7=0, out_channels_of_p6p7=0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Feature Pyramid Network (FPN) for Mask-RCNN

Parameters:
  • in_channels_list – number of channels of feature maps
  • out_channels – number of channels of FPN output
  • top_blocks – top blocks option

extra operation to be performed on the smallest resolution FPN output, whose result is appended to the result list 0 for null, 1 for using max pooling on the last level 2 for extra layers P6 and P7 in RetinaNet :param in_channels_of_p6p7 number of input channels of P6 P7 :param out_channels_of_p6p7 number of output channels of P6 P7

>>> import numpy as np
>>> feature1 = np.random.rand(1,1,8,8)
>>> feature2 = np.random.rand(1,2,4,4)
>>> feature3 = np.random.rand(1,4,2,2)
>>> m = FPN([1,2,4],2,2,4,2)
creating: createFPN
>>> out = m.forward([feature1, feature2, feature3])
class bigdl.nn.layer.FeedForwardNetwork(hidden_size, filter_size, relu_dropout, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Implementation FeedForwardNetwork constructed with fully connected network. Input with shape (batch_size, length, hidden_size) Output with shape (batch_size, length, hidden_size)

>>> ffn = FeedForwardNetwork(8, 4, 1.0)
creating: createFeedForwardNetwork
class bigdl.nn.layer.FlattenTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This is a table layer which takes an arbitrarily deep table of Tensors (potentially nested) as input and a table of Tensors without any nested table will be produced

>>> flattenTable = FlattenTable()
creating: createFlattenTable
class bigdl.nn.layer.GRU(input_size, hidden_size, p=0.0, activation=None, inner_activation=None, wRegularizer=None, uRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Gated Recurrent Units architecture. The first input in sequence uses zero value for cell and hidden state

Parameters:

It can also be the name of an existing activation as a string. :param inner_activation: activation function for the inner cells, by default to be Sigmoid if not specified. It can also be the name of an existing activation as a string. :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param uRegularizer: instance [[Regularizer]](eg. L1 or L2 regularization), applied to the recurrent weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias.

>>> gru = GRU(4, 3, 0.5, Tanh(), Sigmoid(), L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createTanh
creating: createSigmoid
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createGRU
class bigdl.nn.layer.GaussianDropout(rate, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply multiplicative 1-centered Gaussian noise. The multiplicative noise will have standard deviation `sqrt(rate / (1 - rate)).

As it is a regularization layer, it is only active at training time.

Parameters:rate – drop probability (as with Dropout).
>>> GaussianDropout = GaussianDropout(0.5)
creating: createGaussianDropout
class bigdl.nn.layer.GaussianNoise(stddev, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs.

As it is a regularization layer, it is only active at training time.

Parameters:stdev – standard deviation of the noise distribution
>>> GaussianNoise = GaussianNoise(0.5)
creating: createGaussianNoise
class bigdl.nn.layer.GaussianSampler(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Takes {mean, log_variance} as input and samples from the Gaussian distribution >>> sampler = GaussianSampler() creating: createGaussianSampler

class bigdl.nn.layer.Gemm(alpha=1.0, beta=1.0, trans_a=False, trans_b=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

class bigdl.nn.layer.GradientReversal(the_lambda=1.0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a simple module preserves the input, but takes the gradient from the subsequent layer, multiplies it by -lambda and passes it to the preceding layer. This can be used to maximise an objective function whilst using gradient descent, as described in [“Domain-Adversarial Training of Neural Networks” (http://arxiv.org/abs/1505.07818)]

Parameters:lambda – hyper-parameter lambda can be set dynamically during training
>>> gradientReversal = GradientReversal(1e-5)
creating: createGradientReversal
>>> gradientReversal = GradientReversal()
creating: createGradientReversal
class bigdl.nn.layer.HardShrink(the_lambda=0.5, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This is a transfer layer which applies the hard shrinkage function element-wise to the input Tensor. The parameter lambda is set to 0.5 by default

        x, if x >  lambda
f(x) =  x, if x < -lambda
        0, otherwise
Parameters:the_lambda – a threshold value whose default value is 0.5
>>> hardShrink = HardShrink(1e-5)
creating: createHardShrink
class bigdl.nn.layer.HardSigmoid(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply Hard-sigmoid function

       |  0, if x < -2.5
f(x) = |  1, if x > 2.5
       |  0.2 * x + 0.5, otherwise
>>> hardSigmoid = HardSigmoid()
creating: createHardSigmoid
class bigdl.nn.layer.HardTanh(min_value=-1.0, max_value=1.0, inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies HardTanh to each element of input, HardTanh is defined:

       |  maxValue, if x > maxValue
f(x) = |  minValue, if x < minValue
       |  x, otherwise
Parameters:
  • min_value – minValue in f(x), default is -1.
  • max_value – maxValue in f(x), default is 1.
  • inplace – whether enable inplace model.
>>> hardTanh = HardTanh(1e-5, 1e5, True)
creating: createHardTanh
>>> hardTanh = HardTanh()
creating: createHardTanh
class bigdl.nn.layer.Highway(size, with_bias=True, activation=None, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Densely connected highway network. Highway layers are a natural extension of LSTMs to feedforward networks.

:param size input size :param with_bias whether to include a bias :param activation activation function. It can also be the name of an existing activation as a string. :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]], applied to the bias.

>>> highway = Highway(2)
creating: createHighway
class bigdl.nn.layer.Identity(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Identity just return the input to output. It’s useful in same parallel container to get an origin input.

>>> identity = Identity()
creating: createIdentity
class bigdl.nn.layer.Index(dimension, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the Tensor index operation along the given dimension.

Parameters:dimension – the dimension to be indexed
>>> index = Index(1)
creating: createIndex
class bigdl.nn.layer.InferReshape(size, batch_mode=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Reshape the input tensor with automatic size inference support. Positive numbers in the size argument are used to reshape the input to the corresponding dimension size. There are also two special values allowed in size: a. 0 means keep the corresponding dimension size of the input unchanged. i.e., if the 1st dimension size of the input is 2, the 1st dimension size of output will be set as 2 as well. b. -1 means infer this dimension size from other dimensions. This dimension size is calculated by keeping the amount of output elements consistent with the input. Only one -1 is allowable in size.

For example, Input tensor with size: (4, 5, 6, 7) -> InferReshape(Array(4, 0, 3, -1)) Output tensor with size: (4, 5, 3, 14) The 1st and 3rd dim are set to given sizes, keep the 2nd dim unchanged, and inferred the last dim as 14.

Parameters:
  • size – the target tensor size
  • batch_mode – whether in batch mode
>>> inferReshape = InferReshape([4, 0, 3, -1], False)
creating: createInferReshape
class bigdl.nn.layer.Input(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Node

Input layer do nothing to the input tensors, just passing them through. It is used as input to the Graph container (add a link) when the first layer of the graph container accepts multiple tensors as inputs.

Each input node of the graph container should accept one tensor as input. If you want a module accepting multiple tensors as input, you should add some Input module before it and connect the outputs of the Input nodes to it.

Please note that the return is not a layer but a Node containing input layer.

>>> input = Input()
creating: createInput
class bigdl.nn.layer.JoinTable(dimension, n_input_dims, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a table module which takes a table of Tensors as input and outputs a Tensor by joining them together along the dimension dimension.

The input to this layer is expected to be a tensor, or a batch of tensors; when using mini-batch, a batch of sample tensors will be passed to the layer and the user need to specify the number of dimensions of each sample tensor in the batch using nInputDims.

Parameters:
  • dimension – to be join in this dimension
  • nInputDims – specify the number of dimensions that this module will receiveIf it is more than the dimension of input tensors, the first dimensionwould be considered as batch size
>>> joinTable = JoinTable(1, 1)
creating: createJoinTable
class bigdl.nn.layer.L1Penalty(l1weight, size_average=False, provide_output=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

adds an L1 penalty to an input (for sparsity). L1Penalty is an inline module that in its forward propagation copies the input Tensor directly to the output, and computes an L1 loss of the latent state (input) and stores it in the module’s loss field. During backward propagation: gradInput = gradOutput + gradLoss.

Parameters:
  • l1weight
  • sizeAverage
  • provideOutput
>>> l1Penalty = L1Penalty(1, True, True)
creating: createL1Penalty
class bigdl.nn.layer.LSTM(input_size, hidden_size, p=0.0, activation=None, inner_activation=None, wRegularizer=None, uRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Parameters:

It can also be the name of an existing activation as a string. :param inner_activation: activation function for the inner cells, by default to be Sigmoid if not specified. It can also be the name of an existing activation as a string. :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param uRegularizer: instance [[Regularizer]](eg. L1 or L2 regularization), applied to the recurrent weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias.

>>> lstm = LSTM(4, 3, 0.5, 'tanh', Sigmoid(), L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createSigmoid
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createTanh
creating: createLSTM
class bigdl.nn.layer.LSTMPeephole(input_size=4, hidden_size=3, p=0.0, wRegularizer=None, uRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Parameters:
  • input_size – the size of each input vector
  • hidden_size – Hidden unit size in the LSTM
  • p – is used for [[Dropout]] probability. For more details aboutRNN dropouts, please refer to[RnnDrop: A Novel Dropout for RNNs in ASR](http://www.stat.berkeley.edu/~tsmoon/files/Conference/asru2015.pdf)[A Theoretically Grounded Application of Dropout in Recurrent Neural Networks](https://arxiv.org/pdf/1512.05287.pdf)
  • wRegularizer – instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices.
  • uRegularizer – instance [[Regularizer]](eg. L1 or L2 regularization), applied to the recurrent weights matrices.
  • bRegularizer – instance of [[Regularizer]]applied to the bias.
>>> lstm = LSTMPeephole(4, 3, 0.5, L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createLSTMPeephole
class bigdl.nn.layer.Layer(jvalue, bigdl_type, *args)[source]

Bases: bigdl.util.common.JavaValue, bigdl.nn.layer.SharedStaticUtils

Layer is the basic component of a neural network and it’s also the base class of layers. Layer can connect to others to construct a complex neural network.

backward(input, grad_output)[source]

NB: It’s for debug only, please use optimizer.optimize() in production. Performs a back-propagation step through the module, with respect to the given input. In general this method makes the assumption forward(input) has been called before, with the same input. This is necessary for optimization reasons. If you do not respect this rule, backward() will compute incorrect gradients.

Parameters:
  • input – ndarray or list of ndarray or JTensor or list of JTensor.
  • grad_output – ndarray or list of ndarray or JTensor or list of JTensor.
Returns:

ndarray or list of ndarray

static check_input(input)[source]
Parameters:input – ndarray or list of ndarray or JTensor or list of JTensor.
Returns:(list of JTensor, isTable)
static convert_output(output)[source]
evaluate(*args)[source]

No argument passed in: Evaluate the model to set train = false, useful when doing test/forward :return: layer itself

Three arguments passed in: A method to benchmark the model quality.

Parameters:
  • dataset – the input data
  • batch_size – batch size
  • val_methods – a list of validation methods. i.e: Top1Accuracy,Top5Accuracy and Loss.
Returns:

a list of the metrics result

forward(input)[source]

NB: It’s for debug only, please use optimizer.optimize() in production. Takes an input object, and computes the corresponding output of the module

Parameters:
  • input – ndarray or list of ndarray
  • input – ndarray or list of ndarray or JTensor or list of JTensor.
Returns:

ndarray or list of ndarray

freeze(names=None)[source]

freeze module, if names is not None, set an array of layers that match given names to be freezed :param names: an array of layer names :return:

static from_jvalue(jvalue, bigdl_type='float')[source]

Create a Python Model base on the given java value :param jvalue: Java object create by Py4j :return: A Python Model

get_dtype()[source]
get_weights()[source]

Get weights for this layer

Returns:list of numpy arrays which represent weight and bias
is_training()[source]
Returns:Whether this layer is in the training mode
>>> layer = Dropout()
creating: createDropout
>>> layer = layer.evaluate()
>>> layer.is_training()
False
>>> layer = layer.training()
>>> layer.is_training()
True
is_with_weights()[source]
name()[source]

Name of this layer

parameters()[source]

Get the model parameters which containing: weight, bias, gradBias, gradWeight

Returns:dict(layername -> dict(parametername -> ndarray))
predict(features, batch_size=-1)[source]

Model inference base on the given data. :param features: it can be a ndarray or list of ndarray for locally inference or RDD[Sample] for running in distributed fashion :param batch_size: total batch size of prediction. :return: ndarray or RDD[Sample] depend on the the type of features.

predict_class(features)[source]

Model inference base on the given data which returning label :param features: it can be a ndarray or list of ndarray for locally inference or RDD[Sample] for running in distributed fashion :return: ndarray or RDD[Sample] depend on the the type of features.

predict_class_distributed(data_rdd)[source]

module predict, return the predict label

Parameters:data_rdd – the data to be predict.
Returns:An RDD represent the predict label.
predict_class_local(X)[source]
Parameters:X – X can be a ndarray or list of ndarray if the model has multiple inputs.

The first dimension of X should be batch. :return: a ndarray as the prediction result.

predict_distributed(data_rdd, batch_size=-1)[source]

Model inference base on the given data. You need to invoke collect() to trigger those action as the returning result is an RDD.

Parameters:
  • data_rdd – the data to be predict.
  • batch_size – total batch size of prediction.
Returns:

An RDD represent the predict result.

predict_image(image_frame, output_layer=None, share_buffer=False, batch_per_partition=4, predict_key='predict')[source]

model predict images, return imageFrame with predicted tensor :param image_frame imageFrame that contains images :param output_layer if output_layer is not null, the output of layer that matches output_layer will be used as predicted output :param share_buffer whether to share same memory for each batch predict results :param batch_per_partition batch size per partition, default is 4 :param predict_key key to store predicted results

predict_local(X, batch_size=-1)[source]
Parameters:X – X can be a ndarray or list of ndarray if the model has multiple inputs.

The first dimension of X should be batch. :param batch_size: total batch size of prediction. :return: a ndarray as the prediction result.

quantize()[source]

Clone self and quantize it, at last return a new quantized model. :return: A new quantized model.

>>> fc = Linear(4, 2)
creating: createLinear
>>> fc.set_weights([np.ones((2, 4)), np.ones((2,))])
>>> input = np.ones((2, 4))
>>> output = fc.forward(input)
>>> expected_output = np.array([[5., 5.], [5., 5.]])
>>> np.testing.assert_allclose(output, expected_output)
>>> quantized_fc = fc.quantize()
>>> quantized_output = quantized_fc.forward(input)
>>> expected_quantized_output = np.array([[5., 5.], [5., 5.]])
>>> np.testing.assert_allclose(quantized_output, expected_quantized_output)
>>> assert("quantized.Linear" in quantized_fc.__str__())
>>> conv = SpatialConvolution(1, 2, 3, 3)
creating: createSpatialConvolution
>>> conv.set_weights([np.ones((2, 1, 3, 3)), np.zeros((2,))])
>>> input = np.ones((2, 1, 4, 4))
>>> output = conv.forward(input)
>>> expected_output = np.array([[[[9., 9.], [9., 9.]], [[9., 9.], [9., 9.]]], [[[9., 9.], [9., 9.]], [[9., 9.], [9., 9.]]]])
>>> np.testing.assert_allclose(output, expected_output)
>>> quantized_conv = conv.quantize()
>>> quantized_output = quantized_conv.forward(input)
>>> expected_quantized_output = np.array([[[[9., 9.], [9., 9.]], [[9., 9.], [9., 9.]]], [[[9., 9.], [9., 9.]], [[9., 9.], [9., 9.]]]])
>>> np.testing.assert_allclose(quantized_output, expected_quantized_output)
>>> assert("quantized.SpatialConvolution" in quantized_conv.__str__())
>>> seq = Sequential()
creating: createSequential
>>> seq = seq.add(conv)
>>> seq = seq.add(Reshape([8, 4], False))
creating: createReshape
>>> seq = seq.add(fc)
>>> input = np.ones([1, 1, 6, 6])
>>> output = seq.forward(input)
>>> expected_output = np.array([[37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.]])
>>> np.testing.assert_allclose(output, expected_output)
>>> quantized_seq = seq.quantize()
>>> quantized_output = quantized_seq.forward(input)
>>> expected_quantized_output = np.array([[37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.], [37., 37.]])
>>> np.testing.assert_allclose(quantized_output, expected_quantized_output)
>>> assert("quantized.Linear" in quantized_seq.__str__())
>>> assert("quantized.SpatialConvolution" in quantized_seq.__str__())
reset()[source]

Initialize the model weights.

save(path, over_write=False)[source]
saveModel(modelPath, weightPath=None, over_write=False)[source]
save_caffe(prototxt_path, model_path, use_v2=True, overwrite=False)[source]
save_tensorflow(inputs, path, byte_order='little_endian', data_format='nhwc')[source]

Save a model to protobuf files so that it can be used in tensorflow inference.

When saving the model, placeholders will be added to the tf model as input nodes. So you need to pass in the names and shapes of the placeholders. BigDL model doesn’t have such information. The order of the placeholder information should be same as the inputs of the graph model. :param inputs: placeholder information, should be an array of tuples (input_name, shape) where ‘input_name’ is a string and shape is an array of integer :param path: the path to be saved to :param byte_order: model byte order :param data_format: model data format, should be “nhwc” or “nchw”

setBRegularizer(bRegularizer)[source]

set bias regularizer :param wRegularizer: bias regularizer :return:

setWRegularizer(wRegularizer)[source]

set weight regularizer :param wRegularizer: weight regularizer :return:

set_name(name)[source]

Give this model a name. There would be a generated name consist of class name and UUID if user doesn’t set it.

set_running_mean(running_mean)[source]

Set the running mean of the layer. Only use this method for a BatchNormalization layer. :param running_mean: a Numpy array.

set_running_std(running_std)[source]

Set the running variance of the layer. Only use this method for a BatchNormalization layer. :param running_std: a Numpy array.

set_seed(seed=123)[source]

You can control the random seed which used to init weights for this model.

Parameters:seed – random seed
Returns:Model itself.
set_weights(weights)[source]

Set weights for this layer

Parameters:weights – a list of numpy arrays which represent weight and bias
Returns:
>>> linear = Linear(3,2)
creating: createLinear
>>> linear.set_weights([np.array([[1,2,3],[4,5,6]]), np.array([7,8])])
>>> weights = linear.get_weights()
>>> weights[0].shape == (2,3)
True
>>> np.testing.assert_allclose(weights[0][0], np.array([1., 2., 3.]))
>>> np.testing.assert_allclose(weights[1], np.array([7., 8.]))
>>> relu = ReLU()
creating: createReLU
>>> from py4j.protocol import Py4JJavaError
>>> try:
...     relu.set_weights([np.array([[1,2,3],[4,5,6]]), np.array([7,8])])
... except Py4JJavaError as err:
...     print(err.java_exception)
...
java.lang.IllegalArgumentException: requirement failed: this layer does not have weight/bias
>>> relu.get_weights()
The layer does not have weight/bias
>>> add = Add(2)
creating: createAdd
>>> try:
...     add.set_weights([np.array([7,8]), np.array([1,2])])
... except Py4JJavaError as err:
...     print(err.java_exception)
...
java.lang.IllegalArgumentException: requirement failed: the number of input weight/bias is not consistant with number of weight/bias of this layer, number of input 1, number of output 2
>>> cAdd = CAdd([4, 1])
creating: createCAdd
>>> cAdd.set_weights(np.ones([4, 1]))
>>> (cAdd.get_weights()[0] == np.ones([4, 1])).all()
True
training(is_training=True)[source]

Set this layer in the training mode or in predition mode if is_training=False

unfreeze(names=None)[source]

unfreeze module, if names is not None, unfreeze layers that match given names :param names: an array of layer names :return:

update_parameters(learning_rate)[source]

NB: It’s for debug only, please use optimizer.optimize() in production.

zero_grad_parameters()[source]

NB: It’s for debug only, please use optimizer.optimize() in production. If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters. Otherwise, it does nothing.

class bigdl.nn.layer.LayerNormalization(hidden_size, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies layer normalization.

>>> norm = LayerNormalization(8)
creating: createLayerNormalization
class bigdl.nn.layer.LeakyReLU(negval=0.01, inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a transfer module that applies LeakyReLU, which parameter negval sets the slope of the negative part: LeakyReLU is defined as: f(x) = max(0, x) + negval * min(0, x)

Parameters:
  • negval – sets the slope of the negative partl
  • inplace – if it is true, doing the operation in-place without using extra state memory
>>> leakyReLU = LeakyReLU(1e-5, True)
creating: createLeakyReLU
class bigdl.nn.layer.Linear(input_size, output_size, with_bias=True, wRegularizer=None, bRegularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

The [[Linear]] module applies a linear transformation to the input data, i.e. y = Wx + b. The input given in forward(input) must be either a vector (1D tensor) or matrix (2D tensor). If the input is a vector, it must have the size of inputSize. If it is a matrix, then each row is assumed to be an input sample of given batch (the number of rows means the batch size and the number of columns should be equal to the inputSize).

:param input_size the size the each input sample :param output_size the size of the module output of each sample :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias. :param init_weight: the optional initial value for the weight :param init_bias: the optional initial value for the bias :param init_grad_weight: the optional initial value for the grad_weight :param init_grad_bias: the optional initial value for the grad_bias

>>> linear = Linear(100, 10, True, L1Regularizer(0.5), L1Regularizer(0.5))
creating: createL1Regularizer
creating: createL1Regularizer
creating: createLinear
>>> import numpy as np
>>> init_weight = np.random.randn(10, 100)
>>> init_bias = np.random.randn(10)
>>> init_grad_weight = np.zeros([10, 100])
>>> init_grad_bias = np.zeros([10])
>>> linear = Linear(100, 10, True, L1Regularizer(0.5), L1Regularizer(0.5), init_weight, init_bias, init_grad_weight, init_grad_bias)
creating: createL1Regularizer
creating: createL1Regularizer
creating: createLinear
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.LocallyConnected1D(n_input_frame, input_frame_size, output_frame_size, kernel_w, stride_w=1, propagate_back=True, weight_regularizer=None, bias_regularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

The LocallyConnected1D layer works similarly to the TemporalConvolution layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. The input tensor in forward(input) is expected to be a 2D tensor (nInputFrame x inputFrameSize) or a 3D tensor (nBatchFrame x nInputFrame x inputFrameSize). :param nInputFrame the input frame channel :param input_frame_size The input frame size expected in sequences given into forward() :param output_frame_size The output frame size the convolution layer will produce. :param kernel_w The kernel width of the convolution :param stride_w The step of the convolution in the width dimension. :param propagate_back Whether propagate gradient back, default is true. :param weight_regularizer instance of [[Regularizer]] (eg. L1 or L2 regularization), applied to the input weights matrices. :param bias_regularizer instance of [[Regularizer]] applied to the bias. :param init_weight Initial weight :param init_bias Initial bias :param init_grad_weight Initial gradient weight :param init_grad_bias Initial gradient bias >>> locallyConnected1D = LocallyConnected1D(10, 6, 12, 5, 5) creating: createLocallyConnected1D >>> locallyConnected1D.setWRegularizer(L1Regularizer(0.5)) creating: createL1Regularizer >>> locallyConnected1D.setBRegularizer(L1Regularizer(0.5)) creating: createL1Regularizer

set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.LocallyConnected2D(n_input_plane, input_width, input_height, n_output_plane, kernel_w, kernel_h, stride_w=1, stride_h=1, pad_w=0, pad_h=0, propagate_back=True, wRegularizer=None, bRegularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, with_bias=True, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

The LocallyConnected2D layer works similarly to the [[SpatialConvolution]] layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.

:param n_input_plane The number of expected input planes in the image given into forward() :param input_width The expected width of input :param input_height The expected height of input :param n_output_plane The number of output planes the convolution layer will produce. :param kernel_w The kernel width of the convolution :param kernel_h The kernel height of the convolution :param stride_w The step of the convolution in the width dimension. :param stride_h The step of the convolution in the height dimension :param pad_w The additional zeros added per width to the input planes. :param pad_h The additional zeros added per height to the input planes. :param propagate_back Propagate gradient back :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias. :param init_weight: the optional initial value for the weight :param init_bias: the optional initial value for the bias :param init_grad_weight: the optional initial value for the grad_weight :param init_grad_bias: the optional initial value for the grad_bias :param with_bias: the optional initial value for if need bias :param data_format: a string value of “NHWC” or “NCHW” to specify the input data format of this layer. In “NHWC” format data is stored in the order of [batch_size, height, width, channels], in “NCHW” format data is stored in the order of [batch_size, channels, height, width].

>>> locallyConnected2D = LocallyConnected2D(6, 2, 4, 12, 5, 5)
creating: createLocallyConnected2D
>>> locallyConnected2D.setWRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
>>> locallyConnected2D.setBRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.Log(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the log function element-wise to the input Tensor, thus outputting a Tensor of the same dimension.

>>> log = Log()
creating: createLog
class bigdl.nn.layer.LogSigmoid(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This class is a transform layer corresponding to the sigmoid function: f(x) = Log(1 / (1 + e ^^ (-x)))

>>> logSigmoid = LogSigmoid()
creating: createLogSigmoid
class bigdl.nn.layer.LogSoftMax(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the LogSoftMax function to an n-dimensional input Tensor. LogSoftmax is defined as: f_i(x) = log(1 / a exp(x_i)) where a = sum_j[exp(x_j)].

>>> logSoftMax = LogSoftMax()
creating: createLogSoftMax
class bigdl.nn.layer.LookupTable(n_index, n_output, padding_value=0.0, max_norm=1.7976931348623157e+308, norm_type=2.0, should_scale_grad_by_freq=False, wRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

a convolution of width 1, commonly used for word embeddings

Parameters:wRegularizer – instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices.
>>> lookupTable = LookupTable(1, 1, 1e-5, 1e-5, 1e-5, True, L1Regularizer(0.5))
creating: createL1Regularizer
creating: createLookupTable
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.LookupTableSparse(n_index, n_output, combiner='sum', max_norm=-1.0, wRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

LookupTable for multi-values. Also called embedding_lookup_sparse in TensorFlow.

The input of LookupTableSparse should be a 2D SparseTensor or two 2D SparseTensors. If the input is a SparseTensor, the values are positive integer ids, values in each row of this SparseTensor will be turned into a dense vector. If the input is two SparseTensors, the first tensor should be the integer ids, just like the SparseTensor input. And the second tensor is the corresponding weights of the integer ids.

Parameters:wRegularizer – instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices.
>>> lookupTableSparse = LookupTableSparse(20, 5, "mean", 2, L1Regularizer(0.5))
creating: createL1Regularizer
creating: createLookupTableSparse
>>> indices = np.array([[0, 0, 1, 2], [0, 1, 0, 3]])
>>> values = np.array([2, 4, 1, 2])
>>> weightValues = np.array([2, 0.5, 1, 3])
>>> input = JTensor.sparse(values, indices, np.array([3, 4]))
>>> weight = JTensor.sparse(weightValues, indices, np.array([3, 4]))
>>> layer1 = LookupTableSparse(10, 4, "mean")
creating: createLookupTableSparse
>>> layer1.set_weights(np.arange(1, 41, 1).reshape(10, 4)) # set weight to 1 to 40
>>> output = layer1.forward([input, weight])
>>> expected_output = np.array([[6.5999999 , 7.60000038, 8.60000038, 9.60000038],[ 1., 2., 3., 4.], [5., 6., 7., 8.]])
>>> np.testing.assert_allclose(output, expected_output, rtol=1e-6, atol=1e-6)
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.MM(trans_a=False, trans_b=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Module to perform matrix multiplication on two mini-batch inputs, producing a mini-batch.

Parameters:
  • trans_a – specifying whether or not transpose the first input matrix
  • trans_b – specifying whether or not transpose the second input matrix
>>> mM = MM(True, True)
creating: createMM
class bigdl.nn.layer.MV(trans=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a module to perform matrix vector multiplication on two mini-batch inputs, producing a mini-batch.

Parameters:trans – whether make matrix transpose before multiplication
>>> mV = MV(True)
creating: createMV
class bigdl.nn.layer.MapTable(module=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

This class is a container for a single module which will be applied to all input elements. The member module is cloned as necessary to process all input elements.

>>> mapTable = MapTable(Linear(100,10))
creating: createLinear
creating: createMapTable
class bigdl.nn.layer.MaskedSelect(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Performs a torch.MaskedSelect on a Tensor. The mask is supplied as a tabular argument with the input on the forward and backward passes.

>>> maskedSelect = MaskedSelect()
creating: createMaskedSelect
class bigdl.nn.layer.Masking(mask_value, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Use a mask value to skip timesteps for a sequence

:param mask_value: mask value

 >>> masking = Masking(0.0)
 creating: createMasking
class bigdl.nn.layer.Max(dim, num_input_dims=-2147483648, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a max operation over dimension dim

Parameters:
  • dim – max along this dimension
  • num_input_dims – Optional. If in a batch model, set to the inputDims.
>>> max = Max(1)
creating: createMax
class bigdl.nn.layer.Maxout(input_size, output_size, maxout_number, with_bias=True, w_regularizer=None, b_regularizer=None, init_weight=None, init_bias=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

A linear maxout layer Maxout layer select the element-wise maximum value of maxoutNumber Linear(inputSize, outputSize) layers

:param input_size: the size the each input sample
:param output_size: the size of the module output of each sample
:param maxout_number: number of Linear layers to use
:param with_bias: whether use bias in Linear
:param w_regularizer: instance of [[Regularizer]]
      (eg. L1 or L2 regularization), applied to the input weights matrices.
:param b_regularizer: instance of [[Regularizer]]
       applied to the bias.
:param init_weight: initial weight
:param init_bias: initial bias

>>> maxout = Maxout(2, 5, 3)
creating: createMaxout
class bigdl.nn.layer.Mean(dimension=1, n_input_dims=-1, squeeze=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a simple layer which applies a mean operation over the given dimension. When nInputDims is provided, the input will be considered as batches. Then the mean operation will be applied in (dimension + 1). The input to this layer is expected to be a tensor, or a batch of tensors; when using mini-batch, a batch of sample tensors will be passed to the layer and the user need to specify the number of dimensions of each sample tensor in the batch using nInputDims.

Parameters:
  • dimension – the dimension to be applied mean operation
  • n_input_dims – specify the number of dimensions that this module will receiveIf it is more than the dimension of input tensors, the first dimension would be consideredas batch size
  • squeeze – default is true, which will squeeze the sum dimension; set it to false to keep the sum dimension
>>> mean = Mean(1, 1, True)
creating: createMean
class bigdl.nn.layer.Min(dim=1, num_input_dims=-2147483648, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a min operation over dimension dim.

Parameters:
  • dim – min along this dimension
  • num_input_dims – Optional. If in a batch model, set to the input_dim.
>>> min = Min(1)
creating: createMin
class bigdl.nn.layer.MixtureTable(dim=2147483647, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Creates a module that takes a table {gater, experts} as input and outputs the mixture of experts (a Tensor or table of Tensors) using a gater Tensor. When dim is provided, it specifies the dimension of the experts Tensor that will be interpolated (or mixed). Otherwise, the experts should take the form of a table of Tensors. This Module works for experts of dimension 1D or more, and for a 1D or 2D gater, i.e. for single examples or mini-batches.

>>> mixtureTable = MixtureTable()
creating: createMixtureTable
>>> mixtureTable = MixtureTable(10)
creating: createMixtureTable
class bigdl.nn.layer.Model(inputs, outputs, jvalue=None, bigdl_type='float', byte_order='little_endian', model_type='bigdl')[source]

Bases: bigdl.nn.layer.Container

A graph container. Each node can have multiple inputs. The output of the node should be a tensor. The output tensor can be connected to multiple nodes. So the module in each node can have a tensor or table input, and should have a tensor output.

The graph container can have multiple inputs and multiple outputs. If there’s one input, the input data fed to the graph module should be a tensor. If there’re multiple inputs, the input data fed to the graph module should be a table, which is actually an sequence of tensor. The order of the input tensors should be same with the order of the input nodes. This is also applied to the gradient from the module in the back propagation.

If there’s one output, the module output is a tensor. If there’re multiple outputs, the module output is a table, which is actually an sequence of tensor. The order of the output tensors is same with the order of the output modules. This is also applied to the gradient passed to the module in the back propagation.

All inputs should be able to connect to outputs through some paths in the graph. It is allowed that some successors of the inputs node are not connect to outputs. If so, these nodes will be excluded in the computation.

We also support initializing a Graph directly from a tensorflow module. In this case, you should pass your tensorflow nodes as inputs and outputs and also specify the byte_order parameter (“little_endian” or “big_endian”) and node_type parameter (“bigdl” or “tensorflow”) node_type parameter.

static from_jvalue(jvalue, bigdl_type='float')[source]

Create a Python Model base on the given java value :param jvalue: Java object create by Py4j :return: A Python Model

static loadModel(modelPath, weightPath=None, bigdl_type='float')[source]

Load a pre-trained Bigdl model.

Parameters:path – The path containing the pre-trained model.
Returns:A pre-trained model.
static load_caffe(model, defPath, modelPath, match_all=True, bigdl_type='float')[source]

Load a pre-trained Caffe model.

Parameters:
  • model – A bigdl model definition which equivalent to the pre-trained caffe model.
  • defPath – The path containing the caffe model definition.
  • modelPath – The path containing the pre-trained caffe model.
Returns:

A pre-trained model.

static load_caffe_model(defPath, modelPath, bigdl_type='float')[source]

Load a pre-trained Caffe model.

Parameters:
  • defPath – The path containing the caffe model definition.
  • modelPath – The path containing the pre-trained caffe model.
Returns:

A pre-trained model.

static load_keras(json_path=None, hdf5_path=None, by_name=False)[source]

Load a pre-trained Keras model.

Parameters:
  • json_path – The json path containing the keras model definition.
  • hdf5_path – The HDF5 path containing the pre-trained keras model weights with or without the model architecture.
Returns:

A bigdl model.

static load_tensorflow(path, inputs, outputs, byte_order='little_endian', bin_file=None, generated_backward=True, bigdl_type='float')[source]

Load a pre-trained Tensorflow model. :param path: The path containing the pre-trained model. :param inputs: The input node of this graph :param outputs: The output node of this graph :param byte_order: byte_order of the file, little_endian or big_endian :param bin_file: the optional bin file produced by bigdl dump_model util function to store the weights :param generated_backward: if generate backward graph :return: A pre-trained model.

static load_torch(path, bigdl_type='float')[source]

Load a pre-trained Torch model.

Parameters:path – The path containing the pre-trained model.
Returns:A pre-trained model.
node(name, bigdl_type='float')[source]

Return the corresponding node has the given name. If the given name doesn’t match any node, an exception will be thrown :param name: node name :param bigdl_type: :return:

save_graph_topology(log_path, bigdl_type='float')[source]

save current model graph to a folder, which can be display in tensorboard by running tensorboard –logdir logPath :param log_path: path to save the model graph :param bigdl_type: :return:

set_input_formats(input_formats, bigdl_type='float')[source]

set input formats for graph. :param input_formats: list of input format numbers :param bigdl_type: :return:

set_output_formats(output_formats, bigdl_type='float')[source]

set output formats for graph. :param output_formats: list of output format numbers :param bigdl_type: :return:

stop_gradient(stop_layers, bigdl_type='float')[source]

stop the input gradient of layers that match the given `names` their input gradient are not computed. And they will not contributed to the input gradient computation of layers that depend on them. :param stop_layers: an array of layer names :param bigdl_type: :return:

static train(output, data, label, opt_method, criterion, batch_size, end_when, session=None, bigdl_type='float')[source]
class bigdl.nn.layer.Mul(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Multiply a single scalar factor to the incoming data

>>> mul = Mul()
creating: createMul
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.MulConstant(scalar, inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Multiplies input Tensor by a (non-learnable) scalar constant. This module is sometimes useful for debugging purposes.

Parameters:
  • scalar – scalar constant
  • inplace – Can optionally do its operation in-place without using extra state memory
>>> mulConstant = MulConstant(2.5)
creating: createMulConstant
class bigdl.nn.layer.MultiRNNCell(cells, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

A cell that enables stack multiple simple rnn cells

>>> cells = []
>>> cells.append(ConvLSTMPeephole3D(4, 3, 3, 3, 1))
creating: createConvLSTMPeephole3D
>>> cells.append(ConvLSTMPeephole3D(4, 3, 3, 3, 1))
creating: createConvLSTMPeephole3D
>>> stacked_convlstm = MultiRNNCell(cells)
creating: createMultiRNNCell
class bigdl.nn.layer.Narrow(dimension, offset, length=1, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Narrow is application of narrow operation in a module. The module further supports a negative length in order to handle inputs with an unknown size.

>>> narrow = Narrow(1, 1, 1)
creating: createNarrow
class bigdl.nn.layer.NarrowTable(offset, length=1, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Creates a module that takes a table as input and outputs the subtable starting at index offset having length elements (defaults to 1 element). The elements can be either a table or a Tensor. If length is negative, it means selecting the elements from the offset to element which located at the abs(length) to the last element of the input.

Parameters:
  • offset – the start index of table
  • length – the length want to select
>>> narrowTable = NarrowTable(1, 1)
creating: createNarrowTable
class bigdl.nn.layer.Negative(inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Create an Negative layer. Computing negative value of each element of input tensor

Parameters:inplace – if output tensor reuse input tensor storage. Default value is false
>>> negative = Negative(False)
creating: createNegative
class bigdl.nn.layer.NegativeEntropyPenalty(beta=0.01, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Penalize the input multinomial distribution if it has low entropy. The input to this layer should be a batch of vector each representing a multinomial distribution. The input is typically the output of a softmax layer.

For forward, the output is the same as input and a NegativeEntropy loss of the latent state will be calculated each time. For backward, gradInput = gradOutput + gradLoss

This can be used in reinforcement learning to discourage the policy from collapsing to a single action for a given state, which improves exploration. See the A3C paper for more detail (https://arxiv.org/pdf/1602.01783.pdf).

>>> ne = NegativeEntropyPenalty(0.01)
creating: createNegativeEntropyPenalty

:param beta penalty coefficient

class bigdl.nn.layer.Node(jvalue, bigdl_type, *args)[source]

Bases: bigdl.util.common.JavaValue

Represent a node in a graph. The connections between nodes are directed.

element()[source]
classmethod of(jvalue, bigdl_type='float')[source]
remove_next_edges()[source]
remove_pre_edges()[source]
class bigdl.nn.layer.Normalize(p, eps=1e-10, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Normalizes the input Tensor to have unit L_p norm. The smoothing parameter eps prevents division by zero when the input contains all zero elements (default = 1e-10). p can be the max value of double

>>> normalize = Normalize(1e-5, 1e-5)
creating: createNormalize
class bigdl.nn.layer.NormalizeScale(p, scale, size, w_regularizer=None, eps=1e-10, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

NormalizeScale is conposed of normalize and scale, this is equal to caffe Normalize layer :param p L_p norm :param eps smoothing parameter :param scale scale parameter :param size size of scale input :param w_regularizer weight regularizer >>> layer = NormalizeScale(2.0, scale = 20.0, size = [1, 5, 1, 1]) creating: createNormalizeScale

class bigdl.nn.layer.PReLU(n_output_plane=0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies parametric ReLU, which parameter varies the slope of the negative part.

PReLU: f(x) = max(0, x) + a * min(0, x)

nOutputPlane’s default value is 0, that means using PReLU in shared version and has only one parameters.

Notice: Please don’t use weight decay on this.

Parameters:n_output_plane – input map number. Default is 0.
>>> pReLU = PReLU(1)
creating: createPReLU
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.Pack(dimension, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Stacks a list of n-dimensional tensors into one (n+1)-dimensional tensor.

>>> layer = Pack(1)
creating: createPack
class bigdl.nn.layer.Padding(dim, pad, n_input_dim, value=0.0, n_index=1, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This module adds pad units of padding to dimension dim of the input. If pad is negative, padding is added to the left, otherwise, it is added to the right of the dimension.

The input to this layer is expected to be a tensor, or a batch of tensors; when using mini-batch, a batch of sample tensors will be passed to the layer and the user need to specify the number of dimensions of each sample tensor in the batch using n_input_dim.

Parameters:
  • dim – the dimension to be applied padding operation
  • pad – num of the pad units
  • n_input_dim – specify the number of dimensions that this module will receiveIf it is more than the dimension of input tensors, the first dimensionwould be considered as batch size
  • value – padding value
>>> padding = Padding(1, 1, 1, 1e-5, 1)
creating: createPadding
class bigdl.nn.layer.PairwiseDistance(norm=2, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a module that takes a table of two vectors as input and outputs the distance between them using the p-norm. The input given in forward(input) is a [[Table]] that contains two tensors which must be either a vector (1D tensor) or matrix (2D tensor). If the input is a vector, it must have the size of inputSize. If it is a matrix, then each row is assumed to be an input sample of the given batch (the number of rows means the batch size and the number of columns should be equal to the inputSize).

Parameters:norm – the norm of distance
>>> pairwiseDistance = PairwiseDistance(2)
creating: createPairwiseDistance
class bigdl.nn.layer.ParallelTable(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

It is a container module that applies the i-th member module to the i-th input, and outputs an output in the form of Table

>>> parallelTable = ParallelTable()
creating: createParallelTable
class bigdl.nn.layer.Pooler(resolution, scales, sampling_ratio, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Pooler selects the feature map which matches the size of RoI for RoIAlign

Parameters:
  • resolution – the resolution of pooled feature maps. Height equals width.
  • scales – spatial scales of each feature map
  • sampling_ratio – sampling ratio
>>> import numpy as np
>>> feature0 = np.random.rand(1,2,2,2)
>>> feature1 = np.random.rand(1,2,4,4)
>>> feature2 = np.random.rand(1,2,8,8)
>>> features = [feature0, feature1, feature2]
>>> input_rois = np.array([0, 0, 3, 3, 2, 2, 50, 50, 50, 50, 500, 500],dtype='float').reshape(3,4)
>>> m = Pooler(2,[1.0, 0.5, 0.25],2)
creating: createPooler
>>> out = m.forward([features,input_rois])
class bigdl.nn.layer.Power(power, scale=1.0, shift=0.0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply an element-wise power operation with scale and shift. f(x) = (shift + scale * x)^power^

Parameters:
  • power – the exponent.
  • scale – Default is 1.
  • shift – Default is 0.
>>> power = Power(1e-5)
creating: createPower
class bigdl.nn.layer.PriorBox(min_sizes, max_sizes=None, aspect_ratios=None, is_flip=True, is_clip=False, variances=None, offset=0.5, img_h=0, img_w=0, img_size=0, step_h=0.0, step_w=0.0, step=0.0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Generate the prior boxes of designated sizes and aspect ratios across all dimensions (H * W) Intended for use with MultiBox detection method to generate prior :param min_sizes minimum box size in pixels. can be multiple. required! :param max_sizes maximum box size in pixels. can be ignored or same as the # of min_size. :param aspect_ratios optional aspect ratios of the boxes. can be multiple :param is_flip optional bool, default true. if set, flip the aspect ratio. :param is_clip whether to clip the prior’s coordidate such that it is within [0, 1] >>> layer = PriorBox([0.1]) creating: createPriorBox

class bigdl.nn.layer.Proposal(pre_nms_topn, post_nms_topn, ratios, scales, rpn_pre_nms_topn_train=12000, rpn_post_nms_topn_train=2000, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Outputs object detection proposals by applying estimated bounding-box transformations to a set of regular boxes (called “anchors”). rois: holds R regions of interest, each is a 5-tuple (n, x1, y1, x2, y2) specifying an image batch index n and a rectangle (x1, y1, x2, y2) scores: holds scores for R regions of interest >>> layer = Proposal(1000, 200, [0.1, 0.2], [2.0, 3.0]) creating: createProposal

class bigdl.nn.layer.RReLU(lower=0.125, upper=0.3333333333333333, inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the randomized leaky rectified linear unit (RReLU) element-wise to the input Tensor, thus outputting a Tensor of the same dimension. Informally the RReLU is also known as ‘insanity’ layer. RReLU is defined as:

f(x) = max(0,x) + a * min(0, x) where a ~ U(l, u).

In training mode negative inputs are multiplied by a factor drawn from a uniform random distribution U(l, u).

In evaluation mode a RReLU behaves like a LeakyReLU with a constant mean factor a = (l + u) / 2.

By default, l = 1/8 and u = 1/3. If l == u a RReLU effectively becomes a LeakyReLU.

Regardless of operating in in-place mode a RReLU will internally allocate an input-sized noise tensor to store random factors for negative inputs.

The backward() operation assumes that forward() has been called before.

For reference see [Empirical Evaluation of Rectified Activations in Convolutional Network]( http://arxiv.org/abs/1505.00853).

Parameters:
  • lower – lower boundary of uniform random distribution
  • upper – upper boundary of uniform random distribution
  • inplace – optionally do its operation in-place without using extra state memory
>>> rReLU = RReLU(1e-5, 1e5, True)
creating: createRReLU
class bigdl.nn.layer.ReLU(ip=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the rectified linear unit (ReLU) function element-wise to the input Tensor, thus outputting a Tensor of the same dimension.

ReLU is defined as: f(x) = max(0, x) Can optionally do its operation in-place without using extra state memory

>>> relu = ReLU()
creating: createReLU
class bigdl.nn.layer.ReLU6(inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Same as ReLU except that the rectifying function f(x) saturates at x = 6

Parameters:inplace – either True = in-place or False = keeping separate state
>>> reLU6 = ReLU6(True)
creating: createReLU6
class bigdl.nn.layer.Recurrent(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

Recurrent module is a container of rnn cells Different types of rnn cells can be added using add() function

>>> recurrent = Recurrent()
creating: createRecurrent
get_hidden_state()[source]

get hidden state and cell at last time step.

Returns:list of hidden state and cell
class bigdl.nn.layer.RecurrentDecoder(output_length, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Recurrent

RecurrentDecoder module is a container of rnn cells which used to make a prediction of the next timestep based on the prediction we made from the previous timestep. Input for RecurrentDecoder is dynamically composed during training. input at t(i) is output at t(i-1), input at t(0) is user input, and user input has to be batch x stepShape(shape of the input at a single time step).

Different types of rnn cells can be added using add() function.

>>> recurrent_decoder = RecurrentDecoder(output_length = 5)
creating: createRecurrentDecoder
class bigdl.nn.layer.Replicate(n_features, dim=1, n_dim=2147483647, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Replicate repeats input nFeatures times along its dim dimension. Notice: No memory copy, it set the stride along the dim-th dimension to zero.

Parameters:
  • n_features – replicate times.
  • dim – dimension to be replicated.
  • n_dim – specify the number of non-batch dimensions.
>>> replicate = Replicate(2)
creating: createReplicate
class bigdl.nn.layer.Reshape(size, batch_mode=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

The forward(input) reshape the input tensor into a size(0) * size(1) * … tensor, taking the elements row-wise.

Parameters:size – the reshape size
>>> reshape = Reshape([1, 28, 28])
creating: createReshape
>>> reshape = Reshape([1, 28, 28], False)
creating: createReshape
class bigdl.nn.layer.ResizeBilinear(output_height, output_width, align_corner=False, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Resize the input image with bilinear interpolation. The input image must be a float tensor with NHWC or NCHW layout

Parameters:
  • output_height – output height
  • output_width – output width
  • align_corner – align corner or not
  • data_format – the data format of the input image, NHWC or NCHW
>>> resizeBilinear = ResizeBilinear(10, 20, False, "NCHW")
creating: createResizeBilinear
class bigdl.nn.layer.Reverse(dimension=1, is_inplace=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Reverse the input w.r.t given dimension. The input can be a Tensor or Table.

Parameters:dim
>>> reverse = Reverse()
creating: createReverse
>>> reverse = Reverse(1, False)
creating: createReverse
class bigdl.nn.layer.RnnCell(input_size, hidden_size, activation, isInputWithBias=True, isHiddenWithBias=True, wRegularizer=None, uRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a simple RNN. User can pass an activation function to the RNN.

Parameters:
  • input_size – the size of each input vector
  • hidden_size – Hidden unit size in simple RNN
  • activation – activation function. It can also be the name of an existing activation as a string.
  • isInputWithBias – boolean
  • isHiddenWithBias – boolean
  • wRegularizer – instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices.
  • uRegularizer – instance [[Regularizer]](eg. L1 or L2 regularization), applied to the recurrent weights matrices.
  • bRegularizer – instance of [[Regularizer]](../regularizers.md),applied to the bias.
>>> rnn = RnnCell(4, 3, Tanh(), True, True, L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createTanh
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createRnnCell
class bigdl.nn.layer.RoiAlign(spatial_scale, sampling_ratio, pooled_h, pooled_w, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Region of interest aligning (RoIAlign) for Mask-RCNN

The RoIAlign uses average pooling on bilinear-interpolated sub-windows to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of pooledH * pooledW (e.g., 7 * 7).

An RoI is a rectangular window into a conv feature map. Each RoI is defined by a four-tuple (x1, y1, x2, y2) that specifies its top-left corner (x1, y1) and its bottom-right corner (x2, y2).

RoIAlign works by dividing the h * w RoI window into an pooledH * pooledW grid of sub-windows of approximate size h/H * w/W. In each sub-window, compute exact values of input features at four regularly sampled locations, and then do average pooling on the values in each sub-window.

Pooling is applied independently to each feature map channel

Parameters:
  • spatial_scale – spatial scale
  • sampling_ratio – sampling ratio
  • pooled_h – spatial extent in height
  • pooled_w – spatial extent in width
>>> import numpy as np
>>> input_data = np.random.rand(1,2,6,8)
>>> input_rois = np.array([0, 0, 7, 5, 6, 2, 7, 5, 3, 1, 6, 4, 3, 3, 3, 3],dtype='float').reshape(4,4)
>>> m = RoiAlign(1.0,3,2,2)
creating: createRoiAlign
>>> out = m.forward([input_data,input_rois])
class bigdl.nn.layer.RoiPooling(pooled_w, pooled_h, spatial_scale, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Region of interest pooling The RoIPooling uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of pooledH * pooledW (e.g., 7 * 7) an RoI is a rectangular window into a conv feature map. Each RoI is defined by a four-tuple (x1, y1, x2, y2) that specifies its top-left corner (x1, y1) and its bottom-right corner (x2, y2). RoI max pooling works by dividing the h * w RoI window into an pooledH * pooledW grid of sub-windows of approximate size h/H * w/W and then max-pooling the values in each sub-window into the corresponding output grid cell. Pooling is applied independently to each feature map channel

Parameters:
  • pooled_w – spatial extent in width
  • pooled_h – spatial extent in height
  • spatial_scale – spatial scale
>>> import numpy as np
>>> input_data = np.random.rand(2,2,6,8)
>>> input_rois = np.array([0, 0, 0, 7, 5, 1, 6, 2, 7, 5, 1, 3, 1, 6, 4, 0, 3, 3, 3, 3],dtype='float64').reshape(4,5)
>>> m = RoiPooling(3,2,1.0)
creating: createRoiPooling
>>> out = m.forward([input_data,input_rois])
class bigdl.nn.layer.SReLU(input_shape, share_axes=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

S-shaped Rectified Linear Unit.

It follows: f(x) = t^r + a^r(x - t^r) for x >= t^r, f(x) = x for t^r > x > t^l, f(x) = t^l + a^l(x - t^l) for x <= t^l.

# References - [Deep Learning with S-shaped Rectified Linear Activation Units](http://arxiv.org/abs/1512.07030)

Parameters:input_shape – shape for tleft, aleft, tright, aright.

E.g. for a 4-D input, the shape is the last 3-D :param shared_axes: the axes along which to share learnable parameters for the activation function. For example, if the incoming feature maps are from a 2D convolution with output shape (batch, height, width, channels), and you wish to share parameters across space so that each filter only has one set of parameters, set shared_axes=[1, 2].

>>> srelu = SReLU((2, 3))
creating: createSReLU
>>> srelu = SReLU((2, 2), (1, 2))
creating: createSReLU
>>> from bigdl.nn.initialization_method import Xavier
>>> init = Xavier()
creating: createXavier
>>> srelu = srelu.set_init_method(tLeftInit=init, aLeftInit=init, tRightInit=init, aRightInit=init)
set_init_method(tLeftInit=None, aLeftInit=None, tRightInit=None, aRightInit=None)[source]
class bigdl.nn.layer.Scale(size, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Scale is the combination of CMul and CAdd Computes the elementwise product of input and weight, with the shape of the weight “expand” to match the shape of the input. Similarly, perform a expand cdd bias and perform an elementwise add

Parameters:size – size of weight and bias
>>> scale = Scale([1,2])
creating: createScale
class bigdl.nn.layer.Select(dim, index, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

A Simple layer selecting an index of the input tensor in the given dimension

Parameters:
  • dimension – the dimension to select
  • index – the index of the dimension to be selected
>>> select = Select(1, 1)
creating: createSelect
class bigdl.nn.layer.SelectTable(index, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Creates a module that takes a table as input and outputs the element at index index (positive or negative). This can be either a table or a Tensor. The gradients of the non-index elements are zeroed Tensors of the same size. This is true regardless of the depth of the encapsulated Tensor as the function used internally to do so is recursive.

Parameters:index – the index to be selected
>>> selectTable = SelectTable(1)
creating: createSelectTable
class bigdl.nn.layer.SequenceBeamSearch(vocab_size, beam_size, alpha, decode_length, eos_id, padding_value, num_hidden_layers, hidden_size, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Find the translated sequence with the highest probability.

Parameters:
  • vocab_size – size of tokens
  • beam_size – number of beams
  • alpha – defining the strength of length normalization
  • decode_length – maximum length to decoded sequence
  • eos_id – id of eos token, used to determine when a sequence has finished

:param padding_value :param num_hidden_layers: number of hidden layers :param hidden_size: size of hidden layer

>>> sequenceBeamSearch = SequenceBeamSearch(4, 3, 0.0, 10, 2.0, 1.0, 2, 5)
creating: createSequenceBeamSearch
class bigdl.nn.layer.Sequential(jvalue=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Container

Sequential provides a means to plug layers together in a feed-forward fully connected manner.

>>> echo = Echo()
creating: createEcho
>>> s = Sequential()
creating: createSequential
>>> s = s.add(echo)
static from_jvalue(jvalue, bigdl_type='float')[source]

Create a Python Model base on the given java value :param jvalue: Java object create by Py4j :return: A Python Model

to_graph()[source]

Convert a sequential model (Sequential) to a graph model (Model) :return: A Python graph model

class bigdl.nn.layer.SharedStaticUtils[source]
static load(path, bigdl_type='float')[source]

Load a pre-trained Bigdl model.

Parameters:path – The path containing the pre-trained model.
Returns:A pre-trained model.
static of(jvalue, bigdl_type='float')[source]

Create a Python Layer base on the given java value and the real type. :param jvalue: Java object create by Py4j :return: A Python Layer

class bigdl.nn.layer.Sigmoid(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the Sigmoid function element-wise to the input Tensor, thus outputting a Tensor of the same dimension.

>>> sigmoid = Sigmoid()
creating: createSigmoid
class bigdl.nn.layer.SoftMax(pos=1, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the SoftMax function to an n-dimensional input Tensor, rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0, 1) and sum to 1. Softmax is defined as: f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i).

>>> softMax = SoftMax()
creating: createSoftMax
class bigdl.nn.layer.SoftMin(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the SoftMin function to an n-dimensional input Tensor, rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0,1) and sum to 1. Softmin is defined as: f_i(x) = exp(-x_i - shift) / sum_j exp(-x_j - shift) where shift = max_i(-x_i).

>>> softMin = SoftMin()
creating: createSoftMin
class bigdl.nn.layer.SoftPlus(beta=1.0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply the SoftPlus function to an n-dimensional input tensor. SoftPlus function: f_i(x) = 1/beta * log(1 + exp(beta * x_i))

Parameters:beta – Controls sharpness of transfer function
>>> softPlus = SoftPlus(1e-5)
creating: createSoftPlus
class bigdl.nn.layer.SoftShrink(the_lambda=0.5, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply the soft shrinkage function element-wise to the input Tensor

SoftShrinkage operator:

       | x - lambda, if x >  lambda
f(x) = | x + lambda, if x < -lambda
       | 0, otherwise
Parameters:the_lambda – lambda, default is 0.5
>>> softShrink = SoftShrink(1e-5)
creating: createSoftShrink
class bigdl.nn.layer.SoftSign(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply SoftSign function to an n-dimensional input Tensor.

SoftSign function: f_i(x) = x_i / (1+|x_i|)

>>> softSign = SoftSign()
creating: createSoftSign
class bigdl.nn.layer.SparseJoinTable(dimension, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

:: Experimental

Sparse version of JoinTable. Backward just pass the origin gradOutput back to the next layers without split. So this layer may just works in Wide&Deep like models.

Parameters:dimension – to be join in this dimension
>>> joinTable = SparseJoinTable(1)
creating: createSparseJoinTable
class bigdl.nn.layer.SparseLinear(input_size, output_size, with_bias=True, backwardStart=-1, backwardLength=-1, wRegularizer=None, bRegularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

SparseLinear is the sparse version of module Linear. SparseLinear has two different from Linear: firstly, SparseLinear’s input Tensor is a SparseTensor. Secondly, SparseLinear doesn’t backward gradient to next layer in the backpropagation by default, as the gradInput of SparseLinear is useless and very big in most cases.

But, considering model like Wide&Deep, we provide backwardStart and backwardLength to backward part of the gradient to next layer.

:param input_size the size the each input sample :param output_size the size of the module output of each sample :param backwardStart backwardStart index, counting from 1 :param backwardLength backward length :param withBias if has bias :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias. :param init_weight: the optional initial value for the weight :param init_bias: the optional initial value for the bias :param init_grad_weight: the optional initial value for the grad_weight :param init_grad_bias: the optional initial value for the grad_bias

>>> sparselinear = SparseLinear(100, 10, True, wRegularizer=L1Regularizer(0.5), bRegularizer=L1Regularizer(0.5))
creating: createL1Regularizer
creating: createL1Regularizer
creating: createSparseLinear
>>> import numpy as np
>>> init_weight = np.random.randn(10, 100)
>>> init_bias = np.random.randn(10)
>>> init_grad_weight = np.zeros([10, 100])
>>> init_grad_bias = np.zeros([10])
>>> sparselinear = SparseLinear(100, 10, True, 1, 5, L1Regularizer(0.5), L1Regularizer(0.5), init_weight, init_bias, init_grad_weight, init_grad_bias)
creating: createL1Regularizer
creating: createL1Regularizer
creating: createSparseLinear
>>> np.random.seed(123)
>>> init_weight = np.random.randn(5, 1000)
>>> init_bias = np.random.randn(5)
>>> sparselinear = SparseLinear(1000, 5, init_weight=init_weight, init_bias=init_bias)
creating: createSparseLinear
>>> input = JTensor.sparse(np.array([1, 3, 5, 2, 4, 6]), np.array([0, 0, 0, 1, 1, 1, 1, 5, 300, 2, 100, 500]), np.array([2, 1000]))
>>> output = sparselinear.forward(input)
>>> expected_output = np.array([[10.09569263, -10.94844246, -4.1086688, 1.02527523, 11.80737209], [7.9651413, 9.7131443, -10.22719955, 0.02345783, -3.74368906]])
>>> np.testing.assert_allclose(output, expected_output, rtol=1e-6, atol=1e-6)
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.SpatialAveragePooling(kw, kh, dw=1, dh=1, pad_w=0, pad_h=0, global_pooling=False, ceil_mode=False, count_include_pad=True, divide=True, format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies 2D average-pooling operation in kWxkH regions by step size dWxdH steps. The number of output features is equal to the number of input planes.

When padW and padH are both -1, we use a padding algorithm similar to the “SAME” padding of tensorflow. That is

outHeight = Math.ceil(inHeight.toFloat/strideH.toFloat) outWidth = Math.ceil(inWidth.toFloat/strideW.toFloat)

padAlongHeight = Math.max(0, (outHeight - 1) * strideH + kernelH - inHeight) padAlongWidth = Math.max(0, (outWidth - 1) * strideW + kernelW - inWidth)

padTop = padAlongHeight / 2 padLeft = padAlongWidth / 2

Parameters:
  • kW – kernel width
  • kH – kernel height
  • dW – step width
  • dH – step height
  • padW – padding width
  • padH – padding height
  • global_pooling – If globalPooling then it will pool over the size of the input by doing

kH = input->height and kW = input->width :param ceilMode: whether the output size is to be ceiled or floored :param countIncludePad: whether to include padding when dividing thenumber of elements in pooling region :param divide: whether to do the averaging :param format: “NCHW” or “NHWC”, indicating the input data format

>>> spatialAveragePooling = SpatialAveragePooling(7,7)
creating: createSpatialAveragePooling
>>> spatialAveragePooling = SpatialAveragePooling(2, 2, 2, 2, -1, -1, True, format="NHWC")
creating: createSpatialAveragePooling
set_weights(weights)[source]

Set weights for this layer

Parameters:weights – a list of numpy arrays which represent weight and bias
Returns:
>>> linear = Linear(3,2)
creating: createLinear
>>> linear.set_weights([np.array([[1,2,3],[4,5,6]]), np.array([7,8])])
>>> weights = linear.get_weights()
>>> weights[0].shape == (2,3)
True
>>> np.testing.assert_allclose(weights[0][0], np.array([1., 2., 3.]))
>>> np.testing.assert_allclose(weights[1], np.array([7., 8.]))
>>> relu = ReLU()
creating: createReLU
>>> from py4j.protocol import Py4JJavaError
>>> try:
...     relu.set_weights([np.array([[1,2,3],[4,5,6]]), np.array([7,8])])
... except Py4JJavaError as err:
...     print(err.java_exception)
...
java.lang.IllegalArgumentException: requirement failed: this layer does not have weight/bias
>>> relu.get_weights()
The layer does not have weight/bias
>>> add = Add(2)
creating: createAdd
>>> try:
...     add.set_weights([np.array([7,8]), np.array([1,2])])
... except Py4JJavaError as err:
...     print(err.java_exception)
...
java.lang.IllegalArgumentException: requirement failed: the number of input weight/bias is not consistant with number of weight/bias of this layer, number of input 1, number of output 2
>>> cAdd = CAdd([4, 1])
creating: createCAdd
>>> cAdd.set_weights(np.ones([4, 1]))
>>> (cAdd.get_weights()[0] == np.ones([4, 1])).all()
True
class bigdl.nn.layer.SpatialBatchNormalization(n_output, eps=1e-05, momentum=0.1, affine=True, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This file implements Batch Normalization as described in the paper: “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Sergey Ioffe, Christian Szegedy This implementation is useful for inputs coming from convolution layers. For non-convolutional layers, see [[BatchNormalization]] The operation implemented is:

      ( x - mean(x) )
y = -------------------- * gamma + beta
   standard-deviation(x)

where gamma and beta are learnable parameters. The learning of gamma and beta is optional.

Parameters:
  • n_output – output feature map number
  • eps – avoid divide zero
  • momentum – momentum for weight update
  • affine – affine operation on output or not

:param data_format a string value (or DataFormat Object in Scala) of “NHWC” or “NCHW” to specify the input data format of this layer. In “NHWC” format data is stored in the order of [batch_size, height, width, channels], in “NCHW” format data is stored in the order of [batch_size, channels, height, width].

>>> spatialBatchNormalization = SpatialBatchNormalization(1)
creating: createSpatialBatchNormalization
>>> import numpy as np
>>> init_weight = np.array([1.0])
>>> init_grad_weight = np.array([0.0])
>>> init_bias = np.array([0.0])
>>> init_grad_bias = np.array([0.0])
>>> spatialBatchNormalization = SpatialBatchNormalization(1, 1e-5, 0.1, True, init_weight, init_bias, init_grad_weight, init_grad_bias)
creating: createSpatialBatchNormalization
>>> spatialBatchNormalization = SpatialBatchNormalization(1, 1e-5, 0.1, True, init_weight, init_bias, init_grad_weight, init_grad_bias, "NHWC")
creating: createSpatialBatchNormalization
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.SpatialContrastiveNormalization(n_input_plane=1, kernel=None, threshold=0.0001, thresval=0.0001, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Subtractive + divisive contrast normalization.

Parameters:
  • n_input_plane
  • kernel
  • threshold
  • thresval
>>> kernel = np.ones([9,9]).astype("float32")
>>> spatialContrastiveNormalization = SpatialContrastiveNormalization(1, kernel)
creating: createSpatialContrastiveNormalization
>>> spatialContrastiveNormalization = SpatialContrastiveNormalization()
creating: createSpatialContrastiveNormalization
class bigdl.nn.layer.SpatialConvolution(n_input_plane, n_output_plane, kernel_w, kernel_h, stride_w=1, stride_h=1, pad_w=0, pad_h=0, n_group=1, propagate_back=True, wRegularizer=None, bRegularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, with_bias=True, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a 2D convolution over an input image composed of several input planes. The input tensor in forward(input) is expected to be a 3D tensor (nInputPlane x height x width).

:param n_input_plane The number of expected input planes in the image given into forward() :param n_output_plane The number of output planes the convolution layer will produce. :param kernel_w The kernel width of the convolution :param kernel_h The kernel height of the convolution :param stride_w The step of the convolution in the width dimension. :param stride_h The step of the convolution in the height dimension :param pad_w The additional zeros added per width to the input planes. :param pad_h The additional zeros added per height to the input planes. :param n_group Kernel group number :param propagate_back Propagate gradient back :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias. :param init_weight: the optional initial value for the weight :param init_bias: the optional initial value for the bias :param init_grad_weight: the optional initial value for the grad_weight :param init_grad_bias: the optional initial value for the grad_bias :param with_bias: the optional initial value for if need bias :param data_format: a string value of “NHWC” or “NCHW” to specify the input data format of this layer. In “NHWC” format data is stored in the order of [batch_size, height, width, channels], in “NCHW” format data is stored in the order of [batch_size, channels, height, width].

>>> spatialConvolution = SpatialConvolution(6, 12, 5, 5)
creating: createSpatialConvolution
>>> spatialConvolution.setWRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
>>> spatialConvolution.setBRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
>>> import numpy as np
>>> init_weight = np.random.randn(1, 12, 6, 5, 5)
>>> init_bias = np.random.randn(12)
>>> init_grad_weight = np.zeros([1, 12, 6, 5, 5])
>>> init_grad_bias = np.zeros([12])
>>> spatialConvolution = SpatialConvolution(6, 12, 5, 5, 1, 1, 0, 0, 1, True, L1Regularizer(0.5), L1Regularizer(0.5), init_weight, init_bias, init_grad_weight, init_grad_bias, True, "NCHW")
creating: createL1Regularizer
creating: createL1Regularizer
creating: createSpatialConvolution
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.SpatialConvolutionMap(conn_table, kw, kh, dw=1, dh=1, pad_w=0, pad_h=0, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This class is a generalization of SpatialConvolution. It uses a generic connection table between input and output features. The SpatialConvolution is equivalent to using a full connection table.

When padW and padH are both -1, we use a padding algorithm similar to the “SAME” padding of tensorflow. That is

outHeight = Math.ceil(inHeight.toFloat/strideH.toFloat) outWidth = Math.ceil(inWidth.toFloat/strideW.toFloat)

padAlongHeight = Math.max(0, (outHeight - 1) * strideH + kernelH - inHeight) padAlongWidth = Math.max(0, (outWidth - 1) * strideW + kernelW - inWidth)

padTop = padAlongHeight / 2 padLeft = padAlongWidth / 2

Parameters:
  • wRegularizer – instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices.
  • bRegularizer – instance of [[Regularizer]]applied to the bias.
>>> ct = np.ones([9,9]).astype("float32")
>>> spatialConvolutionMap = SpatialConvolutionMap(ct, 9, 9)
creating: createSpatialConvolutionMap
class bigdl.nn.layer.SpatialCrossMapLRN(size=5, alpha=1.0, beta=0.75, k=1.0, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies Spatial Local Response Normalization between different feature maps. The operation implemented is:

                             x_f
y_f =  -------------------------------------------------
        (k+(alpha/size)* sum_{l=l1 to l2} (x_l^2^))^beta^

where x_f is the input at spatial locations h,w (not shown for simplicity) and feature map f, l1 corresponds to max(0,f-ceil(size/2)) and l2 to min(F, f-ceil(size/2) + size). Here, F is the number of feature maps.

Parameters:
  • size – the number of channels to sum over
  • alpha – the scaling parameter
  • beta – the exponent
  • k – a constant

:param data_format a string value (or DataFormat Object in Scala) of “NHWC” or “NCHW” to specify the input data format of this layer. In “NHWC” format data is stored in the order of [batch_size, height, width, channels], in “NCHW” format data is stored in the order of [batch_size, channels, height, width]

>>> spatialCrossMapLRN = SpatialCrossMapLRN()
creating: createSpatialCrossMapLRN
>>> spatialCrossMapLRN = SpatialCrossMapLRN(5, 1.0, 0.75, 1.0, "NHWC")
creating: createSpatialCrossMapLRN
class bigdl.nn.layer.SpatialDilatedConvolution(n_input_plane, n_output_plane, kw, kh, dw=1, dh=1, pad_w=0, pad_h=0, dilation_w=1, dilation_h=1, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply a 2D dilated convolution over an input image.

The input tensor is expected to be a 3D or 4D(with batch) tensor.

If input is a 3D tensor nInputPlane x height x width, owidth = floor(width + 2 * padW - dilationW * (kW-1) - 1) / dW + 1 oheight = floor(height + 2 * padH - dilationH * (kH-1) - 1) / dH + 1

Reference Paper: Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[J]. arXiv preprint arXiv:1511.07122, 2015.

Parameters:
  • n_input_plane – The number of expected input planes in the image given into forward().
  • n_output_plane – The number of output planes the convolution layer will produce.
  • kw – The kernel width of the convolution.
  • kh – The kernel height of the convolution.
  • dw – The step of the convolution in the width dimension. Default is 1.
  • dh – The step of the convolution in the height dimension. Default is 1.
  • pad_w – The additional zeros added per width to the input planes. Default is 0.
  • pad_h – The additional zeros added per height to the input planes. Default is 0.
  • dilation_w – The number of pixels to skip. Default is 1.
  • dilation_h – The number of pixels to skip. Default is 1.
  • init_method – Init method, Default, Xavier.
  • wRegularizer – instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices.
  • bRegularizer – instance of [[Regularizer]]applied to the bias.
>>> spatialDilatedConvolution = SpatialDilatedConvolution(1, 1, 1, 1)
creating: createSpatialDilatedConvolution
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.SpatialDivisiveNormalization(n_input_plane=1, kernel=None, threshold=0.0001, thresval=0.0001, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a spatial division operation on a series of 2D inputs using kernel for computing the weighted average in a neighborhood. The neighborhood is defined for a local spatial region that is the size as kernel and across all features. For an input image, since there is only one feature, the region is only spatial. For an RGB image, the weighted average is taken over RGB channels and a spatial region.

If the kernel is 1D, then it will be used for constructing and separable 2D kernel. The operations will be much more efficient in this case.

The kernel is generally chosen as a gaussian when it is believed that the correlation of two pixel locations decrease with increasing distance. On the feature dimension, a uniform average is used since the weighting across features is not known.

Parameters:
  • nInputPlane – number of input plane, default is 1.
  • kernel – kernel tensor, default is a 9 x 9 tensor.
  • threshold – threshold
  • thresval – threshhold value to replace withif data is smaller than theshold
>>> kernel = np.ones([9,9]).astype("float32")
>>> spatialDivisiveNormalization = SpatialDivisiveNormalization(2,kernel)
creating: createSpatialDivisiveNormalization
>>> spatialDivisiveNormalization = SpatialDivisiveNormalization()
creating: createSpatialDivisiveNormalization
class bigdl.nn.layer.SpatialDropout1D(init_p=0.5, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This version performs the same function as Dropout, however it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead.

:param initP the probability p

>>> dropout = SpatialDropout1D(0.4)
creating: createSpatialDropout1D
class bigdl.nn.layer.SpatialDropout2D(init_p=0.5, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This version performs the same function as Dropout, however it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.

:param initP the probability p :param format ‘NCHW’ or ‘NHWC’. In ‘NCHW’ mode, the channels dimension (the depth) is at index 1, in ‘NHWC’ mode is it at index 4.

>>> dropout = SpatialDropout2D(0.4, "NHWC")
creating: createSpatialDropout2D
class bigdl.nn.layer.SpatialDropout3D(init_p=0.5, data_format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This version performs the same function as Dropout, however it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead.

:param initP the probability p :param format ‘NCHW’ or ‘NHWC’. In ‘NCHW’ mode, the channels dimension (the depth) is at index 1, in ‘NHWC’ mode is it at index 4.

>>> dropout = SpatialDropout3D(0.5, "NHWC")
creating: createSpatialDropout3D
class bigdl.nn.layer.SpatialFullConvolution(n_input_plane, n_output_plane, kw, kh, dw=1, dh=1, pad_w=0, pad_h=0, adj_w=0, adj_h=0, n_group=1, no_bias=False, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply a 2D full convolution over an input image. The input tensor is expected to be a 3D or 4D(with batch) tensor. Note that instead of setting adjW and adjH, SpatialFullConvolution[Table, T] also accepts a table input with two tensors: T(convInput, sizeTensor) where convInput is the standard input tensor, and the size of sizeTensor is used to set the size of the output (will ignore the adjW and adjH values used to construct the module). This module can be used without a bias by setting parameter noBias = true while constructing the module.

If input is a 3D tensor nInputPlane x height x width, owidth = (width - 1) * dW - 2*padW + kW + adjW oheight = (height - 1) * dH - 2*padH + kH + adjH

Other frameworks call this operation “In-network Upsampling”, “Fractionally-strided convolution”, “Backwards Convolution,” “Deconvolution”, or “Upconvolution.”

Reference Paper: Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.

:param nInputPlane The number of expected input planes in the image given into forward() :param nOutputPlane The number of output planes the convolution layer will produce. :param kW The kernel width of the convolution. :param kH The kernel height of the convolution. :param dW The step of the convolution in the width dimension. Default is 1. :param dH The step of the convolution in the height dimension. Default is 1. :param padW The additional zeros added per width to the input planes. Default is 0. :param padH The additional zeros added per height to the input planes. Default is 0. :param adjW Extra width to add to the output image. Default is 0. :param adjH Extra height to add to the output image. Default is 0. :param nGroup Kernel group number. :param noBias If bias is needed. :param initMethod Init method, Default, Xavier, Bilinear. :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias.

>>> spatialFullConvolution = SpatialFullConvolution(1, 1, 1, 1)
creating: createSpatialFullConvolution
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.SpatialMaxPooling(kw, kh, dw, dh, pad_w=0, pad_h=0, to_ceil=False, format='NCHW', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies 2D max-pooling operation in kWxkH regions by step size dWxdH steps. The number of output features is equal to the number of input planes. If the input image is a 3D tensor nInputPlane x height x width, the output image size will be nOutputPlane x oheight x owidth where owidth = op((width + 2*padW - kW) / dW + 1) oheight = op((height + 2*padH - kH) / dH + 1) op is a rounding operator. By default, it is floor. It can be changed by calling :ceil() or :floor() methods.

When padW and padH are both -1, we use a padding algorithm similar to the “SAME” padding of tensorflow. That is

outHeight = Math.ceil(inHeight.toFloat/strideH.toFloat) outWidth = Math.ceil(inWidth.toFloat/strideW.toFloat)

padAlongHeight = Math.max(0, (outHeight - 1) * strideH + kernelH - inHeight) padAlongWidth = Math.max(0, (outWidth - 1) * strideW + kernelW - inWidth)

padTop = padAlongHeight / 2 padLeft = padAlongWidth / 2

Parameters:
  • kW – kernel width
  • kH – kernel height
  • dW – step size in width
  • dH – step size in height
  • padW – padding in width
  • padH – padding in height
  • format – “NCHW” or “NHWC”, indicating the input data format
>>> spatialMaxPooling = SpatialMaxPooling(2, 2, 2, 2)
creating: createSpatialMaxPooling
>>> spatialMaxPooling = SpatialMaxPooling(2, 2, 2, 2, -1, -1, True, "NHWC")
creating: createSpatialMaxPooling
class bigdl.nn.layer.SpatialSeparableConvolution(n_input_channel, n_output_channel, depth_multiplier, kernel_w, kernel_h, stride_w=1, stride_h=1, pad_w=0, pad_h=0, with_bias=True, data_format='NCHW', w_regularizer=None, b_regularizer=None, p_regularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Separable convolutions consist in first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step.

:param n_input_channel The number of expected input planes in the image given into forward() :param n_output_channel The number of output planes the convolution layer will produce. :param depth_multiplier how many internal channels are generated per input channel :param kernel_w The kernel width of the convolution :param kernel_h The kernel height of the convolution :param stride_w The step of the convolution in the width dimension. :param stride_h The step of the convolution in the height dimension :param pad_w The additional zeros added per width to the input planes. :param pad_h The additional zeros added per height to the input planes. :param with_bias: the optional initial value for if need bias :param data_format: a string value of “NHWC” or “NCHW” to specify the input data format of this layer. In “NHWC” format data is stored in the order of [batch_size, height, width, channels], in “NCHW” format data is stored in the order of [batch_size, channels, height, width]. :param w_regularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the depth weights matrices. :param b_regularizer: instance of [[Regularizer]]applied to the pointwise bias. :param p_regularizer: instance of [[Regularizer]]applied to the pointwise weights.

>>> conv = SpatialSeparableConvolution(6, 12, 1, 5, 5)
creating: createSpatialSeparableConvolution
>>> conv.setWRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
>>> conv.setBRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
>>> conv = SpatialSeparableConvolution(6, 12, 1, 5, 5, 1, 1, 0, 0, True, "NCHW", L1Regularizer(0.5), L1Regularizer(0.5), L1Regularizer(0.5))
creating: createL1Regularizer
creating: createL1Regularizer
creating: createL1Regularizer
creating: createSpatialSeparableConvolution
class bigdl.nn.layer.SpatialShareConvolution(n_input_plane, n_output_plane, kernel_w, kernel_h, stride_w=1, stride_h=1, pad_w=0, pad_h=0, n_group=1, propagate_back=True, wRegularizer=None, bRegularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, with_bias=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

>>> spatialShareConvolution = SpatialShareConvolution(1, 1, 1, 1)
creating: createSpatialShareConvolution
>>> import numpy as np
>>> init_weight = np.random.randn(1, 12, 6, 5, 5)
>>> init_bias = np.random.randn(12)
>>> init_grad_weight = np.zeros([1, 12, 6, 5, 5])
>>> init_grad_bias = np.zeros([12])
>>> conv = SpatialShareConvolution(6, 12, 5, 5, 1, 1, 0, 0, 1, True, L1Regularizer(0.5), L1Regularizer(0.5), init_weight, init_bias, init_grad_weight, init_grad_bias)
creating: createL1Regularizer
creating: createL1Regularizer
creating: createSpatialShareConvolution
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.SpatialSubtractiveNormalization(n_input_plane=1, kernel=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a spatial subtraction operation on a series of 2D inputs using kernel for computing the weighted average in a neighborhood. The neighborhood is defined for a local spatial region that is the size as kernel and across all features. For a an input image, since there is only one feature, the region is only spatial. For an RGB image, the weighted average is taken over RGB channels and a spatial region.

If the kernel is 1D, then it will be used for constructing and separable 2D kernel. The operations will be much more efficient in this case.

The kernel is generally chosen as a gaussian when it is believed that the correlation of two pixel locations decrease with increasing distance. On the feature dimension, a uniform average is used since the weighting across features is not known.

Parameters:
  • n_input_plane – number of input plane, default is 1.
  • kernel – kernel tensor, default is a 9 x 9 tensor.
>>> kernel = np.ones([9,9]).astype("float32")
>>> spatialSubtractiveNormalization = SpatialSubtractiveNormalization(2,kernel)
creating: createSpatialSubtractiveNormalization
>>> spatialSubtractiveNormalization = SpatialSubtractiveNormalization()
creating: createSpatialSubtractiveNormalization
class bigdl.nn.layer.SpatialWithinChannelLRN(size=5, alpha=1.0, beta=0.75, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

The local response normalization layer performs a kind of lateral inhibition by normalizing over local input regions. the local regions extend spatially, in separate channels (i.e., they have shape 1 x local_size x local_size).

:param size the side length of the square region to sum over :param alpha the scaling parameter :param beta the exponent

>>> layer = SpatialWithinChannelLRN()
creating: createSpatialWithinChannelLRN
class bigdl.nn.layer.SpatialZeroPadding(pad_left, pad_right, pad_top, pad_bottom, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Each feature map of a given input is padded with specified number of zeros. If padding values are negative, then input is cropped.

Parameters:
  • padLeft – pad left position
  • padRight – pad right position
  • padTop – pad top position
  • padBottom – pad bottom position
>>> spatialZeroPadding = SpatialZeroPadding(1, 1, 1, 1)
creating: createSpatialZeroPadding
class bigdl.nn.layer.SplitTable(dimension, n_input_dims=-1, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Creates a module that takes a Tensor as input and outputs several tables, splitting the Tensor along the specified dimension dimension. Please note the dimension starts from 1.

The input to this layer is expected to be a tensor, or a batch of tensors; when using mini-batch, a batch of sample tensors will be passed to the layer and the user needs to specify the number of dimensions of each sample tensor in a batch using nInputDims.

Parameters:
  • dimension – to be split along this dimension
  • n_input_dims – specify the number of dimensions that this module will receiveIf it is more than the dimension of input tensors, the first dimensionwould be considered as batch size
>>> splitTable = SplitTable(1, 1)
creating: createSplitTable
class bigdl.nn.layer.Sqrt(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply an element-wise sqrt operation.

>>> sqrt = Sqrt()
creating: createSqrt
class bigdl.nn.layer.Square(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply an element-wise square operation.

>>> square = Square()
creating: createSquare
class bigdl.nn.layer.Squeeze(dim, num_input_dims=-2147483648, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Delete singleton all dimensions or a specific dim.

Parameters:
  • dim – Optional. The dimension to be delete. Default: delete all dimensions.
  • num_input_dims – Optional. If in a batch model, set to the inputDims.
>>> squeeze = Squeeze(1)
creating: createSqueeze
class bigdl.nn.layer.Sum(dimension=1, n_input_dims=-1, size_average=False, squeeze=True, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

It is a simple layer which applies a sum operation over the given dimension. When nInputDims is provided, the input will be considered as a batches. Then the sum operation will be applied in (dimension + 1) The input to this layer is expected to be a tensor, or a batch of tensors; when using mini-batch, a batch of sample tensors will be passed to the layer and the user need to specify the number of dimensions of each sample tensor in the batch using nInputDims.

Parameters:
  • dimension – the dimension to be applied sum operation
  • n_input_dims – specify the number of dimensions that this module will receiveIf it is more than the dimension of input tensors, the first dimensionwould be considered as batch size
  • size_average – default is false, if it is true, it will return the mean instead
  • squeeze – default is true, which will squeeze the sum dimension; set it to false to keep the sum dimension
>>> sum = Sum(1, 1, True, True)
creating: createSum
class bigdl.nn.layer.TableOperation(operation_layer, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

When two tensors have different size, firstly expand small size tensor to large size tensor, and then do table operation.

>>> norm = TableOperation(CMulTable())
creating: createCMulTable
creating: createTableOperation
class bigdl.nn.layer.Tanh(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies the Tanh function element-wise to the input Tensor, thus outputting a Tensor of the same dimension. Tanh is defined as f(x) = (exp(x)-exp(-x))/(exp(x)+exp(-x)).

>>> tanh = Tanh()
creating: createTanh
class bigdl.nn.layer.TanhShrink(bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

A simple layer for each element of the input tensor, do the following operation during the forward process: [f(x) = tanh(x) - 1]

>>> tanhShrink = TanhShrink()
creating: createTanhShrink
class bigdl.nn.layer.TemporalConvolution(input_frame_size, output_frame_size, kernel_w, stride_w=1, propagate_back=True, weight_regularizer=None, bias_regularizer=None, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a 1D convolution over an input sequence composed of nInputFrame frames.. The input tensor in forward(input) is expected to be a 2D tensor (nInputFrame x inputFrameSize) or a 3D tensor (nBatchFrame x nInputFrame x inputFrameSize).

:param input_frame_size The input frame size expected in sequences given into forward() :param output_frame_size The output frame size the convolution layer will produce. :param kernel_w The kernel width of the convolution :param stride_w The step of the convolution in the width dimension. :param propagate_back Whether propagate gradient back, default is true. :param weight_regularizer instance of [[Regularizer]] (eg. L1 or L2 regularization), applied to the input weights matrices. :param bias_regularizer instance of [[Regularizer]] applied to the bias. :param init_weight Initial weight :param init_bias Initial bias :param init_grad_weight Initial gradient weight :param init_grad_bias Initial gradient bias

>>> temporalConvolution = TemporalConvolution(6, 12, 5, 5)
creating: createTemporalConvolution
>>> temporalConvolution.setWRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
>>> temporalConvolution.setBRegularizer(L1Regularizer(0.5))
creating: createL1Regularizer
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.TemporalMaxPooling(k_w, d_w=-1, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies 1D max-pooling operation in kW regions by step size dW steps. Input sequence composed of nInputFrame frames. The input tensor in forward(input) is expected to be a 2D tensor (nInputFrame x inputFrameSize) or a 3D tensor (nBatchFrame x nInputFrame x inputFrameSize).

If the input sequence is a 2D tensor of dimension nInputFrame x inputFrameSize, the output sequence will be nOutputFrame x inputFrameSize where

nOutputFrame = (nInputFrame - k_w) / d_w + 1

Parameters:
  • k_w – kernel width
  • d_w – step size in width, default is -1, means the d_w equals k_w
>>> temporalMaxPooling = TemporalMaxPooling(2, 2)
creating: createTemporalMaxPooling
class bigdl.nn.layer.Threshold(th=1e-06, v=0.0, ip=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Threshold input Tensor. If values in the Tensor smaller than th, then replace it with v

Parameters:
  • th – the threshold to compare with
  • v – the value to replace with
  • ip – inplace mode
>>> threshold = Threshold(1e-5, 1e-5, True)
creating: createThreshold
class bigdl.nn.layer.Tile(dim=1, copies=2, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Replicate ‘copies’ copy along ‘dim’ dimension

>>> layer = Tile(1, 2)
creating: createTile
class bigdl.nn.layer.TimeDistributed(model, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This layer is intended to apply contained layer to each temporal time slice of input tensor.

For instance, The TimeDistributed Layer can feed each time slice of input tensor to the Linear layer.

The input data format is [Batch, Time, Other dims]. For the contained layer, it must not change the Other dims length.

>>> td = TimeDistributed(Linear(2, 3))
creating: createLinear
creating: createTimeDistributed
class bigdl.nn.layer.Transformer(vocab_size, hidden_size, num_heads, filter_size, num_hidden_layers, postprocess_dropout, attention_dropout, relu_dropout, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Implementation for Transformer >>> layer = Transformer(20, 4, 2, 3, 1, 0.1, 0.1, 0.1) creating: createTransformer

class bigdl.nn.layer.Transpose(permutations, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Transpose input along specified dimensions

Parameters:permutations – dimension pairs that need to swap
>>> transpose = Transpose([(1,2)])
creating: createTranspose
class bigdl.nn.layer.Unsqueeze(pos, num_input_dims=-2147483648, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Create an Unsqueeze layer. Insert singleton dim (i.e., dimension 1) at position pos. For an input with dim = input.dim(), there are dim + 1 possible positions to insert the singleton dimension.

Parameters:
  • pos – The position will be insert singleton.
  • num_input_dims – Optional. If in a batch model, set to the inputDim
>>> unsqueeze = Unsqueeze(1, 1)
creating: createUnsqueeze
class bigdl.nn.layer.UpSampling1D(length, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Upsampling layer for 1D inputs. Repeats each temporal step length times along the time axis.

If input’s size is (batch, steps, features), then the output’s size is (batch, steps * length, features)

:param length integer, upsampling factor. >>> upsampled1d = UpSampling1D(2) creating: createUpSampling1D

class bigdl.nn.layer.UpSampling2D(size, data_format='nchw', bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Upsampling layer for 2D inputs. Repeats the heights and widths of the data by size[0] and size[1] respectively.

If input’s dataformat is NCHW, then the size of output is (N, C, H * size[0], W * size[1])

:param size tuple of 2 integers. The upsampling factors for heights and widths. :param format DataFormat, NCHW or NHWC

>>> upsampled2d = UpSampling2D([2, 3])
creating: createUpSampling2D
class bigdl.nn.layer.UpSampling3D(size, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Upsampling layer for 3D inputs. Repeats the 1st, 2nd and 3rd dimensions of the data by size[0], size[1] and size[2] respectively. The input data is assumed to be of the form minibatch x channels x depth x height x width.

:param size Repeats the depth, height, width dimensions of the data by >>> upsample3d = UpSampling3D([1, 2, 3]) creating: createUpSampling3D

class bigdl.nn.layer.View(sizes, num_input_dims=0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

This module creates a new view of the input tensor using the sizes passed to the constructor. The method setNumInputDims() allows to specify the expected number of dimensions of the inputs of the modules. This makes it possible to use minibatch inputs when using a size -1 for one of the dimensions.

Parameters:size – sizes use for creates a new view
>>> view = View([1024,2])
creating: createView
class bigdl.nn.layer.VolumetricAveragePooling(k_t, k_w, k_h, d_t, d_w, d_h, pad_t=0, pad_w=0, pad_h=0, count_include_pad=True, ceil_mode=False, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies 3D average-pooling operation in kTxkWxkH regions by step size dTxdWxdH. The number of output features is equal to the number of input planes / dT. The input can optionally be padded with zeros. Padding should be smaller than half of kernel size. That is, padT < kT/2, padW < kW/2 and padH < kH/2

Parameters:
  • k_t – The kernel size
  • k_w – The kernel width
  • k_h – The kernel height
  • d_t – The step in the time dimension
  • d_w – The step in the width dimension
  • d_h – The step in the height dimension
  • pad_t – The padding in the time dimension
  • pad_w – The padding in the width dimension
  • pad_h – The padding in the height dimension
  • count_include_pad – whether to include padding when dividing the number of elements in pooling region
  • ceil_mode – whether the output size is to be ceiled or floored
>>> volumetricAveragePooling = VolumetricAveragePooling(5, 5, 5, 1, 1, 1)
creating: createVolumetricAveragePooling
class bigdl.nn.layer.VolumetricConvolution(n_input_plane, n_output_plane, k_t, k_w, k_h, d_t=1, d_w=1, d_h=1, pad_t=0, pad_w=0, pad_h=0, with_bias=True, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies a 3D convolution over an input image composed of several input planes. The input tensor in forward(input) is expected to be a 4D tensor (nInputPlane x time x height x width).

Parameters:
  • n_input_plane – The number of expected input planes in the image given into forward()
  • n_output_plane – The number of output planes the convolution layer will produce.
  • k_t – The kernel size of the convolution in time
  • k_w – The kernel width of the convolution
  • k_h – The kernel height of the convolution
  • d_t – The step of the convolution in the time dimension. Default is 1
  • d_w – The step of the convolution in the width dimension. Default is 1
  • d_h – The step of the convolution in the height dimension. Default is 1
  • pad_t – Additional zeros added to the input plane data on both sides of time axis.Default is 0. (kT-1)/2 is often used here.
  • pad_w – The additional zeros added per width to the input planes.
  • pad_h – The additional zeros added per height to the input planes.
  • with_bias – whether with bias
  • wRegularizer – instance of [[Regularizer]] (eg. L1 or L2 regularization), applied to the input weights matrices.
  • bRegularizer – instance of [[Regularizer]] applied to the bias.
>>> volumetricConvolution = VolumetricConvolution(6, 12, 5, 5, 5, 1, 1, 1)
creating: createVolumetricConvolution
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.VolumetricFullConvolution(n_input_plane, n_output_plane, kt, kw, kh, dt=1, dw=1, dh=1, pad_t=0, pad_w=0, pad_h=0, adj_t=0, adj_w=0, adj_h=0, n_group=1, no_bias=False, wRegularizer=None, bRegularizer=None, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Apply a 3D full convolution over an 3D input image, a sequence of images, or a video etc. The input tensor is expected to be a 4D or 5D(with batch) tensor. Note that instead of setting adjT, adjW and adjH, VolumetricFullConvolution also accepts a table input with two tensors: T(convInput, sizeTensor) where convInput is the standard input tensor, and the size of sizeTensor is used to set the size of the output (will ignore the adjT, adjW and adjH values used to construct the module). This module can be used without a bias by setting parameter noBias = true while constructing the module.

If input is a 4D tensor nInputPlane x depth x height x width, odepth = (depth - 1) * dT - 2*padt + kT + adjT owidth = (width - 1) * dW - 2*padW + kW + adjW oheight = (height - 1) * dH - 2*padH + kH + adjH

Other frameworks call this operation “In-network Upsampling”, “Fractionally-strided convolution”, “Backwards Convolution,” “Deconvolution”, or “Upconvolution.”

Reference Paper: Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.

:param nInputPlane The number of expected input planes in the image given into forward() :param nOutputPlane The number of output planes the convolution layer will produce. :param kT The kernel depth of the convolution. :param kW The kernel width of the convolution. :param kH The kernel height of the convolution. :param dT The step of the convolution in the depth dimension. Default is 1. :param dW The step of the convolution in the width dimension. Default is 1. :param dH The step of the convolution in the height dimension. Default is 1. :param padT The additional zeros added per depth to the input planes. Default is 0. :param padW The additional zeros added per width to the input planes. Default is 0. :param padH The additional zeros added per height to the input planes. Default is 0. :param adjT Extra depth to add to the output image. Default is 0. :param adjW Extra width to add to the output image. Default is 0. :param adjH Extra height to add to the output image. Default is 0. :param nGroup Kernel group number. :param noBias If bias is needed. :param wRegularizer: instance of [[Regularizer]](eg. L1 or L2 regularization), applied to the input weights matrices. :param bRegularizer: instance of [[Regularizer]]applied to the bias.

>>> volumetricFullConvolution = VolumetricFullConvolution(1, 1, 1, 1, 1, 1)
creating: createVolumetricFullConvolution
set_init_method(weight_init_method=None, bias_init_method=None)[source]
class bigdl.nn.layer.VolumetricMaxPooling(k_t, k_w, k_h, d_t, d_w, d_h, pad_t=0, pad_w=0, pad_h=0, bigdl_type='float')[source]

Bases: bigdl.nn.layer.Layer

Applies 3D max-pooling operation in kTxkWxkH regions by step size dTxdWxdH. The number of output features is equal to the number of input planes / dT. The input can optionally be padded with zeros. Padding should be smaller than half of kernel size. That is, padT < kT/2, padW < kW/2 and padH < kH/2

Parameters:
  • k_t – The kernel size
  • k_w – The kernel width
  • k_h – The kernel height
  • d_t – The step in the time dimension
  • d_w – The step in the width dimension
  • d_h – The step in the height dimension
  • pad_t – The padding in the time dimension
  • pad_w – The padding in the width dimension
  • pad_h – The padding in the height dimension
>>> volumetricMaxPooling = VolumetricMaxPooling(5, 5, 5, 1, 1, 1)
creating: createVolumetricMaxPooling

Module contents