Does it make sense to call "loss_per_sample.backward()" while training a NN?

amitoz · July 9, 2020, 3:49pm

One way to train a NN is

for epoch in range(1,30): 
     pred  = model(whole_input_data)
     loss = myLoss(pred, whole_target_data)
     opt.zero_grad()
     loss.backward()
     opt.step()

If the whole_input_data is very large, one resolves to training in mini-batches like this:

for epoch in range(1,300):
     for batch_x, batch_y in loader:
           pred = model(batch_x)
           loss_batch = myLoss(pred, batch_y)
           opt.zero_grad()
           loss.backward()
           opt.step()

The downside of above is one has to increase the number of epochs.

Does it make sense to call backward() method over loss_per_sample like following to train a NN;

for epoch in range(1, 3000):
     for x_per_sample, y_per_sample in zip(X_train, y_train):
           pred = model(x_per_sample)
           loss_per_sample = myLoss(pred, y_per_sample)
           opt.zero_grad()
           loss_per_sample.backward()
           opt.step()

Will it train the NN?

Nikronic · July 10, 2020, 7:15am

Hi,

Main benefit of batching is that when dataset is too big, you cannot compute backward and forward due to different hardware or math issues, so every batch works as a representation of your whole data so that is why you see more fluctuation in your loss.

About third scenario, you can see it as a case with batch_size=1. An intuitive idea is that you still have disadvantage of increasing epochs. Another issue is that 1 sample cannot represent enough info for training so probably your model cannot fit as by viewing a new sample after 100 epochs , the learned features about first epoch may have been vanished.
Actually, this situation needs to be examined mathematically, as it can be expressed by state of optimizer, the concept of loss function and finding optima.

Bests

amitoz · July 10, 2020, 1:57pm

Hey buddy,

Thanks for your reply. From the research I did, it would still train the neural network but as expected it will take even more epochs and with strong fluctuations in loss decay. In the book DL with pytorch, they breifly touch this topic in section 7.2.4

Best

Nikronic · July 10, 2020, 4:57pm

Yes exactly, that makes more sense.
Actually, I am just waiting for the day that I finish my university exams to be able to read that book THROUGHLY!

Thank yor for sharing