Dear All,
For curiosity, I extracted network layers and calculated output by hand, given a new input.
A very simple regression model:
NeuralNet(
(l0): Linear(in_features=6, out_features=256, bias=True)
(relu): ReLU()
(l00): Linear(in_features=256, out_features=1, bias=True)
)
My manual calculation:
ReLU = lambda x: np.maximum(0.0, x)
# GPU torch.Tensor to CPU numpy ndarray
X_data = X_valid.cpu().numpy()
# First layer
W0 = model.l0.weight.cpu().detach().numpy()
b0 = model.l0.bias.cpu().detach().numpy()
# Final Layer
W00 = model.l00.weight.cpu().detach().numpy()
b00 = model.l00.bias.cpu().detach().numpy()
# First output
L0 = np.dot(W0, np.transpose(X_data)) + np.tile(np.reshape(b0, (-1, 1)), X_data.shape[0])
L0 = np.array(list(map(ReLU, L0)))
# Final output
L00 = np.dot(W00, L0) + np.tile(np.reshape(b00, (-1, 1)), X_data.shape[0])
L00 = np.array(list(map(ReLU, L00)))
It works. I get the same results compared with model(X_data)
.
However, when I added a BatchNorm1d layer,
NeuralNet(
(l0): Linear(in_features=6, out_features=256, bias=True)
(bn0): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU()
(l00): Linear(in_features=256, out_features=1, bias=True)
)
and used the following script to calculate out, it did not work. I ended up with very different results compared with model(X_data)
.
ReLU = lambda x: np.maximum(0.0, x)
# GPU torch.Tensor to CPU numpy ndarray
X_data = X_valid.cpu().numpy()
# First layer
W0 = model.l0.weight.cpu().detach().numpy()
b0 = model.l0.bias.cpu().detach().numpy()
# Final Layer
W00 = model.l00.weight.cpu().detach().numpy()
b00 = model.l00.bias.cpu().detach().numpy()
# First output
L0 = np.dot(W0, np.transpose(X_data)) + np.tile(np.reshape(b0, (-1, 1)), X_data.shape[0])
L0 = np.array(list(map(ReLU, L0)))
# Batch Normalization Layer
bn_mean=model.bn0.running_mean.cpu().numpy()
bn_var=model.bn0.running_var.cpu().numpy()
bn_gamma=model.bn0.weight.cpu().detach().numpy()
bn_beta=model.bn0.bias.cpu().detach().numpy() # previously a typo here!
bn_epsilon = model.bn0.eps
# Reshape for Matrix calculation
bn_mean = np.tile(np.reshape(bn_mean, (-1, 1)), L0.shape[1])
bn_var = np.tile(np.reshape(bn_var, (-1, 1)), L0.shape[1])
bn_gamma = np.tile(np.reshape(bn_gamma, (-1, 1)), L0.shape[1])
bn_beta = np.tile(np.reshape(bn_beta, (-1, 1)), L0.shape[1])
L0 = np.multiply(np.divide(L0-bn_mean,np.sqrt(bn_var+bn_epsilon)), bn_gamma)+bn_beta
# Final output
L00 = np.dot(W00, L0) + np.tile(np.reshape(b00, (-1, 1)), X_data.shape[0])
L00 = np.array(list(map(ReLU, L00)))
What’s wrong with my BatchNorm1d layer calculation? I used the formula I found in the PyTorch website.