Initialize the given weight and bias.
Initialize the given weight and bias.
the weight to initialize
the data format of weight indicating the dimension order of the weight. "output_first" means output is in the lower dimension "input_first" means input is in the lower dimension.
VarianceNorm use average of (fanIn + fanOut) or just fanOut
A Filler based on the paper [He, Zhang, Ren and Sun 2015]: Specifically accounts for ReLU nonlinearities.
Aside: for another perspective on the scaling factor, see the derivation of [Saxe, McClelland, and Ganguli 2013 (v3)].
It fills the incoming matrix by randomly sampling Gaussian data with std = sqrt(2 / n) where n is the fanIn, fanOut, or their average, depending on the varianceNormAverage parameter.
VarianceNorm use average of (fanIn + fanOut) or just fanOut