Layer Norm

在小批量输入上应用Layer Normalization,如论文 Layer Normalization 中所描述。

image

均值和标准差是根据最后的D维度计算的,其中D是normalized_shape 的维度。例如,如果normalized_shape(3, 5) (一个2维的形状),均值和标准差是在输入的最后2个维度上计算的(即input.mean((-2, -1)) )。当elementwise_affineTrue 时,γ和β是normalized_shape 的可学习的仿射变换参数。标准差是通过偏置估计器计算的,相当于torch.var(input, unbiased=False)。

Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the affine option, Layer Normalization applies per-element scale and bias with elementwise_affine .

This layer uses statistics computed from input data in both training and evaluation modes.

参数:

  • normalized_shape (int list torch.Size) – 期望输入的大小的输入形状:

[∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]] [∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]]

如果使用了单个整数,它会被视为一个单一的列表,此模块会归一化最后一个维度,该维度预计为特定大小。

  • eps (float) – 为数值稳定性加到分母上的值。默认值:1e-5
  • elementwise_affine (bool) – 一个布尔值,当设置为True时,此模块具有可学习的每元素仿射参数,初始化为1(对于权重)和0(对于偏置)。默认值:True

变量:

  • weight – 当elementwise_affine设置为True时,模块的可学习权重的形状是normalized_shapenormalized_shape。值初始化为1。
  • bias – 当elementwise_affine设置为True时,模块的可学习偏置的形状是normalized_shapenormalized_shape。值初始化为0。

样例

# NLP Example
batch, sentence_length, embedding_dim = 20, 5, 10
embedding = torch.randn(batch, sentence_length, embedding_dim)
layer_norm = nn.LayerNorm(embedding_dim)
# Activate module
layer_norm(embedding)
# Image Example
N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)
# Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
# as shown in the image below
layer_norm = nn.LayerNorm([C, H, W])
output = layer_norm(input)

ln

参考