Using a simple example, after Initializing the model:
import numpy as np
import torch
from torch import nn
from torch import tensor
from torch import optim
torch.manual_seed(42)
device = 'gpu' if torch.cuda.is_available() else 'cpu'
X = xor_input = tensor([[0,0], [0,1], [1,0], [1,1]]).float().to(device)
Y = xor_output = tensor([[0],[1],[1],[0]]).float().to(device)
# Use tensor.shape to get the shape of the matrix/tensor.
num_data, input_dim = X.shape
print('Inputs Dim:', input_dim) # i.e. n=2
num_data, output_dim = Y.shape
print('Output Dim:', output_dim)
print('No. of Data:', num_data) # i.e. n=4
hidden_dim = 5
learning_rate= 0.3
model = nn.Sequential(
# Use nn.Linear to get our simple perceptron.
nn.Linear(input_dim, hidden_dim),
# Use nn.Sigmoid to get our sigmoid non-linearity.
nn.Sigmoid(),
# Second layer neurons.
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid()
)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
criterion = nn.L1Loss()
Before the first backwards pass, the optimizer’s parameter groups doesn’t contain any .grad
tensors, e.g. this returns None.
optimizer.param_groups[0]['params'][0].grad
After the backwards pass:
predictions = model(X)
loss = criterion(predictions, Y)
loss.backward()
optimizer.param_groups[0]['params'][0].grad
Now the optimizer’s param_groups contain .grad
tensors, e.g. optimizer.param_groups[0]['params'][0].grad
now returns:
tensor([[ 0.0002, 0.0002],
[-0.0005, 0.0003],
[-0.0000, 0.0000],
[-0.0000, -0.0002],
[ 0.0003, -0.0001]])
I understand that these values are the ones that are added during the .step()
function.
The .grad
tensors would come from the loss.backwards() but I don’t see any interaction between the L1Loss
object the SGD
optimizer object, so the tensors from the model.parameters()
should be the keeping these backwards value.
But how are these values from the .grad
tensors obtained?