Newbie Autograd + mini-batch question

I’m trying to learn neural networks and pytorch at the same time, so as an exercise I figured I was going to extend the autograd tutorial to classify 2d-points and using mini batches at the same time. This is the solution I’ve arrived at:

from sklearn.datasets import make_moons
import numpy as np
import torch
from torch.autograd import Variable   

# Generate some test data
X, Y = make_moons(noise=0.2, random_state=100, n_samples=1000)

# Create the model:

dtype = torch.FloatTensor

x_train = Variable(torch.from_numpy(X).type(dtype), requires_grad = False)
y_train = Variable(torch.from_numpy(Y).type(dtype), requires_grad = False)

# Batch, input layer, hidden layer, output layer sizes
N, D_in, H, D_out = 64, x_train.data.shape[1], 25, 1

epochs = 1000
learning_rate = 0.001
n_batches = int(np.ceil(x_train.data.shape[0]/N))

W1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad = True)
b1 = Variable(torch.zeros(1, H).type(dtype), requires_grad = True)

W2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad = True)
b2 = Variable(torch.zeros(1, D_out).type(dtype), requires_grad = True)

for _ in range(epochs):
    for i in range(n_batches):
        x_batch = x_train[i*N:(i+1)*N,:]
        y_batch = y_train[i*N:(i+1)*N]
        y_pred = ((x_batch.mm(W1) + b1).clamp(min = 0).mm(W2) + b2).sigmoid()

        loss = -(1/N)*(y_batch*(y_pred.log()) + (1 - y_batch)*(1 - y_pred).log()).sum() 

        if (_ % 250 == 0 and i == 0):
            print("Epoch {}, loss {}".format(_, loss.data[0]))

        loss.backward()

        W1.data -= learning_rate*W1.grad.data
        b1.data -= learning_rate*b1.grad.data
        W2.data -= learning_rate*W2.grad.data
        b2.data -= learning_rate*b2.grad.data

        W1.grad.data.zero_()
        b1.grad.data.zero_()
        W2.grad.data.zero_()
        b2.grad.data.zero_()

The problem is that it doesn’t learn anything. When I handcoded the same model without using the autograd it worked fine, so I think I’m lacking some knowledge of how to use the autograd.

Hi,

The problem is in the computation of your loss, you have a 1/N which in python2 is equal to 0. And so the value of your loss is alway 0 and independant of your network. Replacing it with 1./N should fix the problem.

I’m running python 3.6 so (unfortunately) that is not the issue here. This is typically what the loss looks like:

Epoch 0, loss 133.6978302001953
Epoch 1, loss 48.95029067993164
Epoch 2, loss 47.533546447753906
Epoch 3, loss 46.9423942565918
Epoch 4, loss 46.53371047973633

And then it never gets better than ~44-46

maybe it is the weight initialize problem, you can try other initialize method.