Simple custom optimizer in a NN via model.parameters()?

Michi01 · May 9, 2022, 4:16pm

Given a “standard” NN in PyTorch:

import torch
import torch.nn as nn
import numpy as np
import os
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

from AugustVonDezent import Adaam

learning_rate = 0.01
BATCH_SIZE = 64



device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()
loss_fn = nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.03)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        pred = model(X)
        loss = loss_fn(pred, y)

        optimizer.zero_grad()

        loss.backward()
        
        for param in model.parameters():
            param.

        optimizer.step()








        #optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss: >7f} [{current: >5d}/{size: >5d}]")

def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X,y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")




epochs = 2
# Ultra ELITE optimizer
for t in range(epochs):
    optimizer = Adaam(model.parameters(), lr=0.35)

    print(f"Epoch {t + 1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
    print("Done!")

#Standard optimizer
for t in range(epochs):
    optimizer = torch.optim.SGD(model.parameters(), lr=0.35)
    print(f"Epoch {t + 1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)

i know that one can access the param by doing a for loop in the model itself:

for param in model.parameters():
param.grade = None (same as optimizer.zero_grad)

However, is it possible to adjust the weights in such a for loop so that one can implement a “simple” custom optimizer without touching the optim library?

For example, im looking for something like this:
for param in model.parameters():
param = param - learning_rate * loss

I know i can create a custom optimizer following the optim optimizer examples (already did that), but i would like to add the backpropogation and adjust the weights as shown above. Does anyone have here a quick tip on how to do that? Thanks in advance

JuanFMontesinos · May 9, 2022, 5:01pm

I mean, no one forces you to use optimizers.
In the end optimizers do a for loop as well, the same way but adding more complex mechanisms. You can do a loop exactly and use model.zero_grad to reset the gradients or set the gradients directly to None.

Michi01 · May 9, 2022, 5:49pm

Hey! Thanks for your answer and taking time!

Right, but how do i update the weights? If i remove in my code optimizer.zero_grad() and also optimizer.step() and add instead the for loop like so:

     for param in model.parameters():
            param.grad = None
            param.grad = -learning_rate * loss

i get RuntimeError: assigned grad has data of a different size as an error. I am doing something wrong? I’ve read this is related due to the device im using (cpu vs gpu) but as far as i can tell i don’t declare anywhere which device im using for the NN. Thanks again for your time!

AlphaBetaGamma96 · May 9, 2022, 6:02pm

This should be,

param = param - learning_rate * grad(loss)

It also explains why you have the mismatch error as learning_rate and loss are both scalars and your weights will be a matrix.

Also, could you explain why you don’t want to use the torch.optim.Optimizer class? It exists to update your weights efficiently and scalably.

Michi01 · May 9, 2022, 6:13pm

Its for a presentation, the implementation should be simple and easy to understand. I feel using the optim implementation makes it a bit bulky for what im trying to do.

I have tried

param = param - learning_rate * loss

before, but i doesn’t really affect the loss. What is the grad function you are calling or is just a typo? Because your code says:

param = param - learning_rate * grad(loss)

Thanks for taking your time! Appreciate it!

JuanFMontesinos · May 9, 2022, 6:32pm

you need to use param.grad which is the the derivative of the loss/parameter so it would be
param = param - learning_rate * param.grad

AlphaBetaGamma96 · May 9, 2022, 6:48pm

As @JuanFMontesinos stated you need to use,
param = param - learning_rate * param.grad

The grad I had was just pseudo-code to say you need the gradient of the loss and not the loss value itself when performing gradient descent. When you call loss.backward() autograd computes the gradient via reverse-mode AD and stores the gradient as the .grad attribute of the param.

Michi01 · May 9, 2022, 7:05pm

Thanks for your help both of you!

That clarified a lot, one last thing: I have tried your suggestion before and i always get this error:

   param = param - learning_rate * param.grad
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

Looking up the internet, people suggests to use torch.no_grad(): before updating the weight, so i’ve tried different things out, for example:


        for param in model.parameters():
            with torch.no_grad():
                param -= learning_rate * param.grad

however, the error still exists. Any ideas why that is so?

AlphaBetaGamma96 · May 9, 2022, 7:18pm

Do you still have this line? If so, remove it. (And check you’ve called loss.backward() beforehand).

Your current param.grad attribute is None. So, you’ve either overwritten it with None or you haven’t called loss.backward()

JuanFMontesinos · May 9, 2022, 8:51pm

If you find None it either means you didn’t call loss.backward() or that your backpropagation is broken do to a bug.
So the pseudo code would be


for param in parameters:
  param.grad= None
out = forward(inputs)
loss = criterion(out,gt)
loss.backward()
for param in parameters:
    param = param - lr * param.grad

Michi01 · May 10, 2022, 9:17am

Thanks for the help everyone!

I was able to find now a working solution and post it here, just in case somebody else runs into that same problem. Given the code base from above, i changed the following:


def update_function(param, grad, loss, learning_rate):
  return param - learning_rate * grad

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):


        pred = model(X)

        model.zero_grad()
        loss = loss_fn(pred, y)

        loss.backward()
        with torch.no_grad():
            for x in model.parameters():
                updated_val = update_function(x, x.grad, loss, learning_rate)
                x.copy_(updated_val)



        #optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss: >7f} [{current: >5d}/{size: >5d}]")

Works now fine and the weights are now being updated correctly!

Thanks and have a great one everyone!