Performing mini-batch gradient descent or stochastic gradient descent on a mini-batch

Joseph_Santarcangelo · July 16, 2018, 7:01pm

Hello, I have created a data-loader object, I set the parameter batch size equal to five and I run the following code. I would like some clarification, is the following code performing mini-batch gradient descent or stochastic gradient descent on a mini-batch.

from torch import nn
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch import nn,optim
from torch.utils.data import Dataset, DataLoader
class Data(Dataset):
    def __init__(self):
        self.x=torch.arange(-3,3,0.1).view(-1, 1)
        self.y=-3*self.x+1+0.1*torch.randn(self.x.size())
        self.len=self.x.shape[0]
    def __getitem__(self,index):    
            
        return self.x[index],self.y[index]
    def __len__(self):
        return self.len
class linear_regression(nn.Module):
    def __init__(self,input_size,output_size):
        super(linear_regression,self).__init__()
        self.linear=nn.Linear(input_size,output_size)
    def forward(self,x):
        yhat=self.linear(x)
        return yhat

class linear_regression(nn.Module):
    def __init__(self,input_size,output_size):
        super(linear_regression,self).__init__()
        self.linear=nn.Linear(input_size,output_size)
    def forward(self,x):
        yhat=self.linear(x)
        return yhat
model=linear_regression(1,1)
optimizer = optim.SGD(model.parameters(), lr = 0.01)
criterion = nn.MSELoss()
dataset=Data()
trainloader=DataLoader(dataset=dataset,batch_size=5)
LOSS=[]

n=1;
for epoch in range(5):
    for x,y in trainloader:
        yhat=model(x)
        loss=criterion(yhat,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        LOSS.append(loss)

i

ptrblck · July 16, 2018, 10:17pm

I’m not sure what stochastic gradient descent on a mini-batch is, since as far as my understanding is, stochastic gradient descent uses only one sample by definition.
Because you use a batch size of 5, your code applies mini-batch gradient descent.

Joseph_Santarcangelo · July 16, 2018, 11:35pm

Thanks for the response, my confusion comes from the fact that the code included below calculates SGD by taking a tensor that is the size of my training set and performing at update one sample at the time. i.e for each iteration of the loop an update of SGD is performed for every sample in the tensor
In the initial code in the second nested loop, the data loader provides a tensor the size of my mini-batch, then how come it does not follow the same procedure as above, i.e for each iteration performing an update on each sample in the tensor.

from torch import nn
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch import nn,optim
from torch.utils.data import Dataset, DataLoader
class Data(Dataset):
   def __init__(self):
       self.x=torch.arange(-3,3,0.1).view(-1, 1)
       self.y=-3*self.x+1+0.1*torch.randn(self.x.size())
       self.len=self.x.shape[0]
   def __getitem__(self,index):    
           
       return self.x[index],self.y[index]
   def __len__(self):
       return self.len
class linear_regression(nn.Module):
   def __init__(self,input_size,output_size):
       super(linear_regression,self).__init__()
       self.linear=nn.Linear(input_size,output_size)
   def forward(self,x):
       yhat=self.linear(x)
       return yhat

class linear_regression(nn.Module):
   def __init__(self,input_size,output_size):
       super(linear_regression,self).__init__()
       self.linear=nn.Linear(input_size,output_size)
   def forward(self,x):
       yhat=self.linear(x)
       return yhat
model=linear_regression(1,1)
optimizer = optim.SGD(model.parameters(), lr = 0.01)
criterion = nn.MSELoss()
dataset=Data()


x,y=dataset[:]
LOSS=[]

n=1;
for epoch in range(5):
       yhat=model(x)
       loss=criterion(yhat,y)
       optimizer.zero_grad()
       loss.backward()
       optimizer.step()
       LOSS.append(loss)

ptrblck · July 17, 2018, 6:43am

In your current code snippet you are assigning x to your complete dataset, i.e. you are performing batch gradient descent.
In the former code your DataLoader provided batches of size 5, so you used mini-batch gradient descent.
If you use a dataloader with batch_size=1 or slice each sample one by one, you would be applying stochastic gradient descent.

The averaged or summed loss will be computed based on your batch size. E.g. if your batch size is 5, and you are using your criterion with its default setting size_average=True, the average or the losses for each sample in the batch will be calculated and used to compute the gradients.

Joseph_Santarcangelo · July 17, 2018, 5:23pm

Thanks for clearing that up! I was confused because I could not see the explicit relationship between the data loader object and the optimizer.

jon · August 24, 2018, 3:33am

Sorry to necro this post, but it bothers me: why is it named a stochastic gradient descent optimizer even if we are doing full batch gradient descent? Is there any sort of difference between what vanilla gradient descent would do vs. the sgd optimizer would do when we run it with the full batch?

jon · August 24, 2018, 3:39am

Ahh, actually sorry, it’s just a mismatch in terminology. The SGD optimizer is vanilla gradient descent (i.e. literally all it does is subtract the gradient * the learning rate from the weight, as expected). See here: How SGD works in pytorch

vinaykumar2491 · October 22, 2018, 5:32am

If you want to store the whole computational graph then its okay to use LOSS.append(loss); but if you are just looking to store the loss value then use LOSS.append(loss.item()).

Joseph_Santarcangelo:

LOSS=[]

n=1;
for epoch in range(5):
    for x,y in trainloader:
        yhat=model(x)
        loss=criterion(yhat,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        LOSS.append(loss)

Also, shouldn’t you be accumulating the loss (for each batch) within each epoch and then appending it to loss array like this (below):

LOSS=[]
...
for epoch in range(5):
    running_loss = 0     #accumulates loss of each batch
    for x,y in trainloader:
        # do something......
        loss.backward()
        optimizer.step()
        running_loss += loss.item()   # accumulating loss
    
    LOSS.append(running_loss)   # saving final loss for each epoch

Joseph_Santarcangelo · October 22, 2018, 8:58pm

Thanks Vinay Kumar, I found out about Item it’s super useful.
For the loss, I was just calculating the average loss for each batch.

Jaddi_abd_elaziz · June 2, 2020, 8:28pm

just a question, if we write running_loss += loss.item() after an update of the parameters, I think like that we calculate these:
for epoch in range(5):
running_loss = 0 #accumulates loss of each batch
for x,y in trainloader:
running_loss += loss.item() <=> F* = F(x,y, w_{0})+ F(x,y, w_{1})+F(x,y, w_{2})+…

in an epoch with F isobjective function.
however our objective function defined by :
F(data,w) = F(x,y, w)+ F(x,y,w)+F(x,y,w)+… for (x,y) in data