Layer Norm

imoldpan · 2023 年9 月 16 日 03:57

在小批量输入上应用Layer Normalization，如论文 Layer Normalization 中所描述。

均值和标准差是根据最后的D维度计算的，其中D是normalized_shape 的维度。例如，如果normalized_shape 是(3, 5) （一个2维的形状），均值和标准差是在输入的最后2个维度上计算的（即input.mean((-2, -1)) ）。当elementwise_affine 为True 时，γ和β是normalized_shape 的可学习的仿射变换参数。标准差是通过偏置估计器计算的，相当于torch.var(input, unbiased=False)。

Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the affine option, Layer Normalization applies per-element scale and bias with elementwise_affine .

This layer uses statistics computed from input data in both training and evaluation modes.

参数：

normalized_shape (int 或 list 或 torch.Size) – 期望输入的大小的输入形状：

[∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]] [∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]]

如果使用了单个整数，它会被视为一个单一的列表，此模块会归一化最后一个维度，该维度预计为特定大小。

eps (float) – 为数值稳定性加到分母上的值。默认值：1e-5
elementwise_affine (bool) – 一个布尔值，当设置为True时，此模块具有可学习的每元素仿射参数，初始化为1（对于权重）和0（对于偏置）。默认值：True。

变量：

weight – 当elementwise_affine设置为True时，模块的可学习权重的形状是normalized_shapenormalized_shape。值初始化为1。
bias – 当elementwise_affine设置为True时，模块的可学习偏置的形状是normalized_shapenormalized_shape。值初始化为0。

样例

# NLP Example
batch, sentence_length, embedding_dim = 20, 5, 10
embedding = torch.randn(batch, sentence_length, embedding_dim)
layer_norm = nn.LayerNorm(embedding_dim)
# Activate module
layer_norm(embedding)
# Image Example
N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)
# Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
# as shown in the image below
layer_norm = nn.LayerNorm([C, H, W])
output = layer_norm(input)

参考

https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html