I’m trying to learn neural networks and pytorch at the same time, so as an exercise I figured I was going to extend the autograd tutorial to classify 2d-points and using mini batches at the same time. This is the solution I’ve arrived at:

```
from sklearn.datasets import make_moons
import numpy as np
import torch
from torch.autograd import Variable
# Generate some test data
X, Y = make_moons(noise=0.2, random_state=100, n_samples=1000)
# Create the model:
dtype = torch.FloatTensor
x_train = Variable(torch.from_numpy(X).type(dtype), requires_grad = False)
y_train = Variable(torch.from_numpy(Y).type(dtype), requires_grad = False)
# Batch, input layer, hidden layer, output layer sizes
N, D_in, H, D_out = 64, x_train.data.shape[1], 25, 1
epochs = 1000
learning_rate = 0.001
n_batches = int(np.ceil(x_train.data.shape[0]/N))
W1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad = True)
b1 = Variable(torch.zeros(1, H).type(dtype), requires_grad = True)
W2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad = True)
b2 = Variable(torch.zeros(1, D_out).type(dtype), requires_grad = True)
for _ in range(epochs):
for i in range(n_batches):
x_batch = x_train[i*N:(i+1)*N,:]
y_batch = y_train[i*N:(i+1)*N]
y_pred = ((x_batch.mm(W1) + b1).clamp(min = 0).mm(W2) + b2).sigmoid()
loss = -(1/N)*(y_batch*(y_pred.log()) + (1 - y_batch)*(1 - y_pred).log()).sum()
if (_ % 250 == 0 and i == 0):
print("Epoch {}, loss {}".format(_, loss.data[0]))
loss.backward()
W1.data -= learning_rate*W1.grad.data
b1.data -= learning_rate*b1.grad.data
W2.data -= learning_rate*W2.grad.data
b2.data -= learning_rate*b2.grad.data
W1.grad.data.zero_()
b1.grad.data.zero_()
W2.grad.data.zero_()
b2.grad.data.zero_()
```

The problem is that it doesn’t learn anything. When I handcoded the same model without using the autograd it worked fine, so I think I’m lacking some knowledge of how to use the autograd.