Minimal working example of optim.SGD

I’d like to learn how to use SGD. But I have not found a minimal working example. It must be a piece of code working on its own. And it should be minimal in the sense that anything that can be deleted without affecting the usage of SGD should be deleted.

By this definition, the example at http://pytorch.org/docs/master/optim.html is not working.

The example at the following URL is not minimal.
http://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html#optimization-and-training

Could anybody show a MWE? Thanks.

Do you want to learn about why SGD works, or just how to use it?

I attempted to make a minimal example of SGD. I hope this helps!

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

# Let's make some data for a linear regression.
A = 3.1415926
b = 2.7189351
error = 0.1
N = 100 # number of data points

# Data
X = Variable(torch.randn(N, 1))

# (noisy) Target values that we want to learn.
t = A * X + b + Variable(torch.randn(N, 1) * error)

# Creating a model, making the optimizer, defining loss
model = nn.Linear(1, 1)
optimizer = optim.SGD(model.parameters(), lr=0.05)
loss_fn = nn.MSELoss()

# Run training
niter = 50
for _ in range(0, niter):
	optimizer.zero_grad()
	predictions = model(X)
	loss = loss_fn(predictions, t)
	loss.backward()
	optimizer.step()

	print("-" * 50)
	print("error = {}".format(loss.data[0]))
	print("learned A = {}".format(list(model.parameters())[0].data[0, 0]))
	print("learned b = {}".format(list(model.parameters())[1].data[0]))
2 Likes

Here is a more minimal MWE.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

N = 64

x0 = torch.randn(N, 1)
x = Variable(x0)
y = Variable(x0, requires_grad=False)

A = Variable(torch.randn(1, 1), requires_grad=True)
b = Variable(torch.randn(1), requires_grad=True)

optimizer = optim.SGD([A, b], lr=1e-1)
for t in range(10):
	print '-' * 50 
	optimizer.zero_grad()
	#print A.grad, b.grad
	y_pred = torch.matmul(x, A) + b
	loss = ((y_pred - y) ** 2).mean()
	print(t, loss.data[0])
	loss.backward()
	optimizer.step()
	#print [A, b]
2 Likes

If you add a line of code to this. “print(predictions.grad)” right after the backward call. Then it will print “None” all the way through. Is that supposed to be? If you read the Autograd documentation i get the impression that a tensor is supposed to come out of it.