在小批量输入上应用Layer Normalization,如论文 Layer Normalization 中所描述。
均值和标准差是根据最后的D维度计算的,其中D是normalized_shape
的维度。例如,如果normalized_shape
是(3, 5)
(一个2维的形状),均值和标准差是在输入的最后2个维度上计算的(即input.mean((-2, -1))
)。当elementwise_affine
为True
时,γ和β是normalized_shape
的可学习的仿射变换参数。标准差是通过偏置估计器计算的,相当于torch.var(input, unbiased=False)。
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
.
This layer uses statistics computed from input data in both training and evaluation modes.
参数:
[∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]] [∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]]
如果使用了单个整数,它会被视为一个单一的列表,此模块会归一化最后一个维度,该维度预计为特定大小。
- eps (float) – 为数值稳定性加到分母上的值。默认值:1e-5
- elementwise_affine (bool) – 一个布尔值,当设置为
True
时,此模块具有可学习的每元素仿射参数,初始化为1(对于权重)和0(对于偏置)。默认值:True
。
变量:
- weight – 当
elementwise_affine
设置为True
时,模块的可学习权重的形状是normalized_shapenormalized_shape。值初始化为1。 - bias – 当
elementwise_affine
设置为True
时,模块的可学习偏置的形状是normalized_shapenormalized_shape。值初始化为0。
样例
# NLP Example
batch, sentence_length, embedding_dim = 20, 5, 10
embedding = torch.randn(batch, sentence_length, embedding_dim)
layer_norm = nn.LayerNorm(embedding_dim)
# Activate module
layer_norm(embedding)
# Image Example
N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)
# Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
# as shown in the image below
layer_norm = nn.LayerNorm([C, H, W])
output = layer_norm(input)