Using a simple example, after Initializing the model:

```
import numpy as np
import torch
from torch import nn
from torch import tensor
from torch import optim
torch.manual_seed(42)
device = 'gpu' if torch.cuda.is_available() else 'cpu'
X = xor_input = tensor([[0,0], [0,1], [1,0], [1,1]]).float().to(device)
Y = xor_output = tensor([[0],[1],[1],[0]]).float().to(device)
# Use tensor.shape to get the shape of the matrix/tensor.
num_data, input_dim = X.shape
print('Inputs Dim:', input_dim) # i.e. n=2
num_data, output_dim = Y.shape
print('Output Dim:', output_dim)
print('No. of Data:', num_data) # i.e. n=4
hidden_dim = 5
learning_rate= 0.3
model = nn.Sequential(
# Use nn.Linear to get our simple perceptron.
nn.Linear(input_dim, hidden_dim),
# Use nn.Sigmoid to get our sigmoid non-linearity.
nn.Sigmoid(),
# Second layer neurons.
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid()
)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
criterion = nn.L1Loss()
```

Before the first backwards pass, the optimizer’s parameter groups doesn’t contain any `.grad`

tensors, e.g. this returns None.

```
optimizer.param_groups[0]['params'][0].grad
```

After the backwards pass:

```
predictions = model(X)
loss = criterion(predictions, Y)
loss.backward()
optimizer.param_groups[0]['params'][0].grad
```

Now the optimizer’s param_groups contain `.grad`

tensors, e.g. `optimizer.param_groups[0]['params'][0].grad`

now returns:

```
tensor([[ 0.0002, 0.0002],
[-0.0005, 0.0003],
[-0.0000, 0.0000],
[-0.0000, -0.0002],
[ 0.0003, -0.0001]])
```

I understand that these values are the ones that are added during the `.step()`

function.

The `.grad`

tensors would come from the loss.backwards() but I don’t see any interaction between the `L1Loss`

object the `SGD`

optimizer object, so the tensors from the `model.parameters()`

should be the keeping these backwards value.

But **how are these values from the .grad tensors obtained?**