Batch Normalization应用于4D输入(带有额外通道维度的2D输入的小批量,说白了就是NCHW这样的),如文献Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift中所描述。
均值和标准差是根据小批量数据在每个维度上计算的(are calculated per-dimension over the mini-batches),γ和β是大小为C的可学习参数向量(其中C是输入大小)。默认情况下,γ的元素设置为1,β的元素设置为0。标准差是通过有偏估计器计算的,等同于torch.var(input, unbiased=False)。
同样,默认情况下,在训练期间,此层保持其计算的均值和方差的运行估计值,这些估计值随后用于评估期间的归一化。这些运行估计值的默认momentum
为0.1。
如果track_running_stats
设置为False
,则该层不再保持运行估计值,并且在评估时也使用批量统计数据。
Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
参数:
- num_features (int) – C from an expected input of size (N,C,H,W)
- eps (float) – a value added to the denominator for numerical stability. Default: 1e-5
- momentum (float) – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1 - affine (bool) – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
- track_running_stats (bool) – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
输入输出维度:
- Input: (N,C,H,W)
- Output: (N,C,H,W) (same shape as input)
示例
# With Learnable Parameters
m = nn.BatchNorm2d(100)
# Without Learnable Parameters
m = nn.BatchNorm2d(100, affine=False)
input = torch.randn(20, 100, 35, 45)
output = m(input)