# Minimal working example of optim.SGD

I’d like to learn how to use SGD. But I have not found a minimal working example. It must be a piece of code working on its own. And it should be minimal in the sense that anything that can be deleted without affecting the usage of SGD should be deleted.

By this definition, the example at http://pytorch.org/docs/master/optim.html is not working.

The example at the following URL is not minimal.
http://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html#optimization-and-training

Could anybody show a MWE? Thanks.

Do you want to learn about why SGD works, or just how to use it?

I attempted to make a minimal example of SGD. I hope this helps!

``````import torch
import torch.nn as nn
import torch.optim as optim

# Let's make some data for a linear regression.
A = 3.1415926
b = 2.7189351
error = 0.1
N = 100 # number of data points

# Data
X = Variable(torch.randn(N, 1))

# (noisy) Target values that we want to learn.
t = A * X + b + Variable(torch.randn(N, 1) * error)

# Creating a model, making the optimizer, defining loss
model = nn.Linear(1, 1)
optimizer = optim.SGD(model.parameters(), lr=0.05)
loss_fn = nn.MSELoss()

# Run training
niter = 50
for _ in range(0, niter):
predictions = model(X)
loss = loss_fn(predictions, t)
loss.backward()
optimizer.step()

print("-" * 50)
print("error = {}".format(loss.data))
print("learned A = {}".format(list(model.parameters()).data[0, 0]))
print("learned b = {}".format(list(model.parameters()).data))
``````
2 Likes

Here is a more minimal MWE.

``````import torch
import torch.nn as nn
import torch.optim as optim

N = 64

x0 = torch.randn(N, 1)
x = Variable(x0)

optimizer = optim.SGD([A, b], lr=1e-1)
for t in range(10):
print '-' * 50