Hi,

I’m currently trying to implement a simple SGD without using a built in optimizer.

What I’m running into is that after the first mini-batch, my loss loses its requires_grad status, and thus throws an error.

Here is my code:

```
import numpy as np
import torch
def generate_data():
data = torch.rand(1000, 2)
label = ((data[:,0]+0.3*data[:,1]) > 0.5).to(torch.int)
return data[:,0], label
input, label = generate_data()
# Make minibatches.
inputs = torch.split(input, 32)
labels = torch.split(label, 32)
# Define the two variables to optimize
b1 = torch.autograd.Variable(torch.tensor([0.01]), requires_grad=True)
b2 = torch.autograd.Variable(torch.tensor([0.01]), requires_grad=True)
for epoch in range(15):
for x, y in zip(inputs,labels):
b1.grad = None
b2.grad = None
# Calculate p_x as per formula above
p_x = 1 / (1 + torch.exp(-(b1 + b2 * x)))
# Calculate the negative loss likelihood
l = (y * torch.log(p_x) + (1 - y) * torch.log(1 - p_x)).sum()
l.backward()
# Calculate the gradient of the loss w.r.t. the inputs
delta_b1 = b1.grad
delta_b2 = b2.grad
# Update the parameters b according to SGD formula
with torch.no_grad():
b1 = b1 - 0.01 * delta_b1
b2 = b2 - 0.01 * delta_b2
```

P.S. I am aware Variable is deprecated, but I am working around given code.

Any help would be greatly appreciated.