-
- set seed
import torch
import torch.nn as nn
import torch.nn.functional as F
from pytorch_lightning.utilities.seed import seed_everything
seed_everything(666)
num=6
x = torch.rand(4,num)
print(x)
>>> tensor([[0.3119, 0.2701, 0.1118, 0.1012, 0.1877, 0.0181],
[0.3317, 0.0846, 0.5732, 0.0079, 0.2520, 0.5518],
[0.8785, 0.5281, 0.4961, 0.9791, 0.5817, 0.4875],
[0.0650, 0.7506, 0.2634, 0.3684, 0.5035, 0.9089]])
-
- randomly init a BN
bn = nn.BatchNorm1d(num)
rand = lambda num: torch.rand(num)
weight = rand(num)
bias = rand(num)
mean = rand(num)
var = rand(num)
bn.weight.data = weight.data
bn.bias.data = bias.data
bn.running_mean.data = mean.data
bn.running_var.data = var.data
-
- compare different ops
bn.eval()
y1 = bn(x)
>>> tensor([[ 0.1871, -0.7274, -0.2401, 0.2894, 0.8241, 0.0806],
[ 0.2042, -1.5069, 0.6280, 0.2882, 0.8298, 0.6823],
[ 0.6761, 0.3567, 0.4828, 0.3000, 0.8590, 0.6098],
[-0.0260, 1.2915, 0.0450, 0.2926, 0.8521, 1.0849]],
grad_fn=<NativeBatchNormBackward>)
y2 = F.batch_norm(x, mean, var, weight, bias, eps=1e-5, momentum=0.1, training=False)
>>> tensor([[ 0.1871, -0.7274, -0.2401, 0.2894, 0.8241, 0.0806],
[ 0.2042, -1.5069, 0.6280, 0.2882, 0.8298, 0.6823],
[ 0.6761, 0.3567, 0.4828, 0.3000, 0.8590, 0.6098],
[-0.0260, 1.2915, 0.0450, 0.2926, 0.8521, 1.0849]])
We can see that the two forms return the same results.
But if we try F.batch_norm(..., training=True)
, we will get totally different results
y2 = F.batch_norm(x, mean, var, weight, bias, eps=1e-5, momentum=0.1, training=True)
>>> tensor([[ 0.1582, -0.3523, -0.2571, 0.2823, 0.8138, -0.5995],
[ 0.1835, -0.9058, 0.7885, 0.2801, 0.8397, 0.7659],
[ 0.8830, 0.4174, 0.6137, 0.3029, 0.9721, 0.6014],
[-0.1577, 1.0812, 0.0864, 0.2886, 0.9407, 1.6796]])
My question is what is the role of training
in F.batch_norm
?