Normalization Layers


BatchNormalization

Scala:

val bn = BatchNormalization(nOutput, eps, momentum, affine)

Python:

bn = BatchNormalization(n_output, eps, momentum, affine)

This layer implements Batch Normalization as described in the paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift by Sergey Ioffe, Christian Szegedy

This implementation is useful for inputs NOT coming from convolution layers. For convolution layers, use nn.SpatialBatchNormalization.

The operation implemented is:

              ( x - mean(x) )
      y =  -------------------- * gamma + beta
              standard-deviation(x)

where gamma and beta are learnable parameters.The learning of gamma and beta is optional.

Parameters:

Scala example:

import com.intel.analytics.bigdl.nn._
import com.intel.analytics.bigdl.utils.T
import com.intel.analytics.bigdl.tensor.Tensor
import com.intel.analytics.bigdl.tensor.TensorNumericMath.TensorNumeric.NumericFloat

val bn = BatchNormalization(2)
val input = Tensor(T(
             T(1.0f, 2.0f),
             T(3.0f, 6.0f))
            )
val gradOutput = Tensor(T(
             T(1.0f, 2.0f),
             T(3.0f, 6.0f))
)
val output = bn.forward(input)
val gradient = bn.backward(input, gradOutput)
-> print(output) 
# There's random factor. An output could be
-0.46433213     -0.2762179      
0.46433213      0.2762179       
[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2]
-> print(gradient)
# There's random factor. An output could be
-4.649627E-6    -6.585548E-7    
4.649627E-6     6.585548E-7     
[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2]

Python example:

from bigdl.nn.layer import *
from bigdl.nn.criterion import *
import numpy as np
bn = BatchNormalization(2)
input = np.array([
  [1.0, 2.0],
  [3.0, 6.0]
])
grad_output = np.array([
           [2.0, 3.0],
           [4.0, 5.0]
         ])
output = bn.forward(input)
gradient = bn.backward(input, grad_output)
-> print output
# There's random factor. An output could be
[[-0.99583918 -0.13030811]
 [ 0.99583918  0.13030811]]
-> print gradient
# There's random factor. An output could be
[[ -9.97191637e-06  -1.55339364e-07]
 [  9.97191637e-06   1.55339364e-07]]

SpatialBatchNormalization

Scala:

val module = SpatialBatchNormalization(nOutput, eps=1e-5, momentum=0.1, affine=true,
                                           initWeight=null, initBias=null, initGradWeight=null, initGradBias=null)

Python:

module = SpatialBatchNormalization(nOutput, eps=1e-5, momentum=0.1, affine=True)

This file implements Batch Normalization as described in the paper: "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" by Sergey Ioffe, Christian Szegedy.

This implementation is useful for inputs coming from convolution layers. For non-convolutional layers, see BatchNormalization The operation implemented is:

        ( x - mean(x) )
  y = -------------------- * gamma + beta
       standard-deviation(x)

  where gamma and beta are learnable parameters.
  The learning of gamma and beta is optional.

Scala example:

import com.intel.analytics.bigdl.nn._
import com.intel.analytics.bigdl.tensor.Tensor
import com.intel.analytics.bigdl.tensor.TensorNumericMath.TensorNumeric.NumericFloat

val layer = SpatialBatchNormalization(3, 1e-3)
val input = Tensor(2, 3, 2, 2).randn()
> print(layer.forward(input))
(1,1,.,.) =
-0.21939678 -0.64394164 
-0.03280549 0.13889995  

(1,2,.,.) =
0.48519397  0.40222475  
-0.9339038  0.4131121   

(1,3,.,.) =
0.39790314  -0.040012743    
-0.009540742    0.21598668  

(2,1,.,.) =
0.32008895  -0.23125978 
0.4053611   0.26305377  

(2,2,.,.) =
-0.3810518  -0.34581286 
0.14797378  0.21226381  

(2,3,.,.) =
0.2558251   -0.2211882  
-0.59388477 -0.00508846 

[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3x2x2]

Python example:

from bigdl.nn.layer import *

layer = SpatialBatchNormalization(3, 1e-3)
input = np.random.rand(2,3,2,2)
>layer.forward(input)
array([[[[  5.70826093e-03,   9.06338100e-05],
         [ -3.49177676e-03,   1.10401707e-02]],

        [[  1.80168569e-01,  -8.87815133e-02],
         [  2.11335659e-01,   2.11817324e-01]],

        [[ -1.02916014e+00,   4.02444333e-01],
         [ -1.72453150e-01,   5.31806648e-01]]],


       [[[ -3.46255396e-03,  -1.37512591e-02],
         [  3.84721952e-03,   1.93112865e-05]],

        [[  4.65962708e-01,  -5.29752195e-01],
         [ -2.28064612e-01,  -2.22685724e-01]],

        [[  8.49217057e-01,  -9.03094828e-01],
         [  8.56826544e-01,  -5.35586655e-01]]]], dtype=float32)

SpatialCrossMapLRN

Scala:

val spatialCrossMapLRN = SpatialCrossMapLRN(size = 5, alpha  = 1.0, beta = 0.75, k = 1.0)

Python:

spatialCrossMapLRN = SpatialCrossMapLRN(size=5, alpha=1.0, beta=0.75, k=1.0)

SpatialCrossMapLRN applies Spatial Local Response Normalization between different feature maps

                             x_f
  y_f =  -------------------------------------------------
          (k+(alpha/size)* sum_{l=l1 to l2} (x_l^2^))^beta^

where  l1 corresponds to `max(0,f-ceil(size/2))` and l2 to `min(F, f-ceil(size/2) + size)`, `F` is the number  of feature maps       

Scala example:

import com.intel.analytics.bigdl.tensor.TensorNumericMath.TensorNumeric.NumericFloat
import com.intel.analytics.bigdl.nn._
import com.intel.analytics.bigdl.tensor._
val spatialCrossMapLRN = SpatialCrossMapLRN(5, 0.01, 0.75, 1.0)

val input = Tensor(2, 2, 2, 2).rand()

> print(input)
(1,1,.,.) =
0.42596373  0.20075735  
0.10307904  0.7486494   

(1,2,.,.) =
0.9887414   0.3554662   
0.6291069   0.53952795  

(2,1,.,.) =
0.41220918  0.5463298   
0.40766734  0.08064394  

(2,2,.,.) =
0.58255607  0.027811589 
0.47811228  0.3082057   

[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 2x2x2x2]

> print(spatialCrossMapLRN.forward(input))
(1,1,.,.) =
0.42522463  0.20070718  
0.10301625  0.74769455  

(1,2,.,.) =
0.98702586  0.35537735  
0.6287237   0.5388398   

(2,1,.,.) =
0.41189456  0.5460847   
0.4074261   0.08063166  

(2,2,.,.) =
0.5821114   0.02779911  
0.47782937  0.3081588   

[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x2x2]

Python example:

from bigdl.nn.layer import *
spatialCrossMapLRN = SpatialCrossMapLRN(5, 0.01, 0.75, 1.0)
> spatialCrossMapLRN.forward(np.array([[[[1, 2],[3, 4]],[[5, 6],[7, 8]]],[[[9, 10],[11, 12]],[[13, 14],[15, 16]]]]))
[array([[[[  0.96269381,   1.88782692],
         [  2.76295042,   3.57862759]],

        [[  4.81346893,   5.66348076],
         [  6.44688463,   7.15725517]]],


       [[[  6.6400919 ,   7.05574226],
         [  7.41468   ,   7.72194815]],

        [[  9.59124374,   9.87803936],
         [ 10.11092758,  10.29593086]]]], dtype=float32)]



SpatialWithinChannelLRN

Scala:

val spatialWithinChannelLRN = SpatialWithinChannelLRN(size = 5, alpha  = 1.0, beta = 0.75)

Python:

spatialWithinChannelLRN = SpatialWithinChannelLRN(size=5, alpha=1.0, beta=0.75)

SpatialWithinChannelLRN performs a kind of “lateral inhibition” by normalizing over local input regions. the local regions extend spatially, in separate channels (i.e., they have shape 1 x size x size).

Scala example:

import com.intel.analytics.bigdl.tensor.TensorNumericMath.TensorNumeric.NumericFloat
import com.intel.analytics.bigdl.nn._
import com.intel.analytics.bigdl.tensor._
val spatialWithinChannelLRN = SpatialWithinChannelLRN(5, 0.01, 0.75)

val input = Tensor(2, 2, 2, 2).rand()

> print(input)
(1,1,.,.) =
0.8658837       0.1297312
0.7559588       0.039047405

(1,2,.,.) =
0.79211944      0.84445393
0.8854509       0.6596644

(2,1,.,.) =
0.96907943      0.7036902
0.90358996      0.5719087

(2,2,.,.) =
0.52309155      0.8838519
0.44981572      0.40950212

[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 2x2x2x2]

> print(spatialWithinChannelLRN.forward(input))
(1,1,.,.) =
0.8655359       0.12967908      
0.75565517      0.03903172      

(1,2,.,.) =
0.7915117       0.843806        
0.8847715       0.6591583       

(2,1,.,.) =
0.9683307       0.70314646      
0.9028918       0.5714668       

(2,2,.,.) =
0.52286804      0.8834743       
0.44962353      0.40932715      

[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x2x2]

Python example:

from bigdl.nn.layer import *
spatialWithinChannelLRN = SpatialWithinChannelLRN(5, 0.01, 0.75)
> spatialWithinChannelLRN.forward(np.array([[[[1, 2],[3, 4]],[[5, 6],[7, 8]]],[[[9, 10],[11, 12]],[[13, 14],[15, 16]]]]))
array([[[[  0.99109352,   1.98218703],
         [  2.97328043,   3.96437407]],

        [[  4.75394297,   5.70473146],
         [  6.65551996,   7.60630846]]],


       [[[  7.95743227,   8.84159184],
         [  9.72575092,  10.60991001]],

        [[ 10.44729614,  11.2509346 ],
         [ 12.05457211,  12.85821056]]]], dtype=float32)



Normalize

Scala:

val module = Normalize(p,eps=1e-10)

Python:

module = Normalize(p,eps=1e-10,bigdl_type="float")

Normalizes the input Tensor to have unit L_p norm. The smoothing parameter eps prevents division by zero when the input contains all zero elements (default = 1e-10). The input can be 1d, 2d or 4d. If the input is 4d, it should follow the format (n, c, h, w) where n is the batch number, c is the channel number, h is the height and w is the width

Scala example:

val module = Normalize(2.0,eps=1e-10)
val input = Tensor(2,3).rand()
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.7075603       0.084298864     0.91339105
0.22373432      0.8704987       0.6936567
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 2x3]

module.forward(input)
res8: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.6107763       0.072768        0.7884524
0.19706465      0.76673317      0.61097115
[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]

Python example:

module = Normalize(2.0,eps=1e-10,bigdl_type="float")
input = np.array([[1, 2, 3],[4, 5, 6]])
module.forward(input)
[array([
[ 0.26726124,  0.53452247,  0.80178368],
[ 0.45584232,  0.56980288,  0.68376344]], dtype=float32)]

SpatialDivisiveNormalization

Scala:

val layer = SpatialDivisiveNormalization()

Python:

layer = SpatialDivisiveNormalization()

Applies a spatial division operation on a series of 2D inputs using kernel for computing the weighted average in a neighborhood. The neighborhood is defined for a local spatial region that is the size as kernel and across all features. For an input image, since there is only one feature, the region is only spatial. For an RGB image, the weighted average is taken over RGB channels and a spatial region.

If the kernel is 1D, then it will be used for constructing and separable 2D kernel. The operations will be much more efficient in this case.

The kernel is generally chosen as a gaussian when it is believed that the correlation of two pixel locations decrease with increasing distance. On the feature dimension, a uniform average is used since the weighting across features is not known.

Scala example:


import com.intel.analytics.bigdl.tensor.TensorNumericMath.TensorNumeric.NumericFloat
import com.intel.analytics.bigdl.nn._
import com.intel.analytics.bigdl.tensor._

val layer = SpatialDivisiveNormalization()
val input = Tensor(1, 5, 5).rand
val gradOutput = Tensor(1, 5, 5).rand

val output = layer.forward(input)
val gradInput = layer.backward(input, gradOutput)

> println(input)
res19: com.intel.analytics.bigdl.tensor.Tensor[Float] =
(1,.,.) =
0.4022106       0.6872489       0.9712838       0.7769542       0.771034
0.97930336      0.61022973      0.65092266      0.9507807       0.3158211
0.12607759      0.320569        0.9267993       0.47579524      0.63989824
0.713135        0.30836385      0.009723447     0.67723924      0.24405171
0.51036286      0.115807846     0.123513035     0.28398398      0.271164

[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x5x5]

> println(output)
res20: com.intel.analytics.bigdl.tensor.Tensor[Float] =
(1,.,.) =
0.37849638      0.6467289       0.91401714      0.73114514      0.725574
0.9215639       0.57425076      0.6125444       0.89472294      0.29720038
0.11864409      0.30166835      0.8721555       0.4477425       0.60217
0.67108876      0.2901828       0.009150156     0.6373094       0.2296625
0.480272        0.10897984      0.11623074      0.26724035      0.25517625

[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x5x5]

> println(gradInput)
res21: com.intel.analytics.bigdl.tensor.Tensor[Float] =
(1,.,.) =
-0.09343022     -0.25612304     0.25756648      -0.66132677     -0.44575396
0.052990615     0.7899354       0.27205157      0.028260134     0.23150417
-0.115425855    0.21133065      0.53093016      -0.36421964     -0.102551565
0.7222408       0.46287358      0.0010696054    0.26336592      -0.050598443
0.03733714      0.2775169       -0.21430963     0.3175013       0.6600435

[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x5x5]

Python example:

from bigdl.nn.layer import *
import numpy as np

layer = SpatialDivisiveNormalization()
input = np.random.uniform(0, 1, (1, 5, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (1, 5, 5)).astype("float32")

output = layer.forward(input)
gradInput = layer.backward(input, gradOutput)

> output
[array([[[ 0.30657911,  0.75221181,  0.2318386 ,  0.84053135,  0.24818985],
         [ 0.32852787,  0.43504578,  0.0219258 ,  0.47856906,  0.31112722],
         [ 0.12381417,  0.61807972,  0.90043157,  0.57342309,  0.65450585],
         [ 0.00401461,  0.33700454,  0.79859954,  0.64382601,  0.51768768],
         [ 0.38087726,  0.8963666 ,  0.7982524 ,  0.78525543,  0.09658573]]], dtype=float32)]
> gradInput
[array([[[ 0.08059166, -0.4616771 ,  0.11626807,  0.30253756,  0.7333734 ],
         [ 0.2633073 , -0.01641282,  0.40653706,  0.07766753, -0.0237394 ],
         [ 0.10733987,  0.23385212, -0.3291783 , -0.12808481,  0.4035565 ],
         [ 0.56126803,  0.49945205, -0.40531909, -0.18559581,  0.27156472],
         [ 0.28016835,  0.03791744, -0.17803842, -0.27817759,  0.42473239]]], dtype=float32)]

SpatialSubtractiveNormalization

Scala:

val spatialSubtractiveNormalization = SpatialSubtractiveNormalization(nInputPlane = 1, kernel = null)

Python:

spatialSubtractiveNormalization = SpatialSubtractiveNormalization(n_input_plane=1, kernel=None)

SpatialSubtractiveNormalization applies a spatial subtraction operation on a series of 2D inputs using kernel for computing the weighted average in a neighborhood.The neighborhood is defined for a local spatial region that is the size as kernel and across all features. For an input image, since there is only one feature, the region is only spatial. For an RGB image, the weighted average is taken over RGB channels and a spatial region.

If the kernel is 1D, then it will be used for constructing and separable 2D kernel. The operations will be much more efficient in this case.

The kernel is generally chosen as a gaussian when it is believed that the correlation of two pixel locations decrease with increasing distance. On the feature dimension, a uniform average is used since the weighting across features is not known.

Scala example:

import com.intel.analytics.bigdl.tensor.TensorNumericMath.TensorNumeric.NumericFloat
import com.intel.analytics.bigdl.nn._
import com.intel.analytics.bigdl.tensor._
val kernel = Tensor(3, 3).rand()

> print(kernel)
0.56141114  0.76815456  0.29409808  
0.3599753   0.17142025  0.5243272   
0.62450963  0.28084084  0.17154165  
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x3]


val spatialSubtractiveNormalization = SpatialSubtractiveNormalization(1, kernel)

val input = Tensor(1, 1, 1, 5).rand()

> print(input)
(1,1,.,.) =
0.122356184 0.44442436  0.6394927   0.9349956   0.8226007   

[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 1x1x1x5]

> print(spatialSubtractiveNormalization.forward(input))
(1,1,.,.) =
-0.2427161  0.012936085 -0.08024883 0.15658027  -0.07613802 

[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x1x1x5]



Python example:

from bigdl.nn.layer import *
kernel=np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
spatialSubtractiveNormalization = SpatialSubtractiveNormalization(1, kernel)
>  spatialSubtractiveNormalization.forward(np.array([[[[1, 2, 3, 4, 5]]]]))
[array([[[[ 0.,  0.,  0.,  0.,  0.]]]], dtype=float32)]