Multiple forward passes, 1 backward pass instead of packed sequences

williamFalcon · December 28, 2017, 2:06pm

Hi, the packed sequence documentation is spotty and hard to work with (it’s also unclear how the y packing is handled in the loss function).

However, it should be mathematically equivalent to do batch_size forward passes and do 1 backward pass on the sum of all those losses. Will pytorch handle this?

ie:

loss_1 = model(x_1)
loss_2 = model(x_2)
loss_3 = model(x_3)
total_loss = loss_1 + loss_2 + loss_3
total_loss.backward()

# equivalent to a padded batch_size = 3 input no?

smth · December 29, 2017, 2:45pm

yes pytorch will handle this correctly.

williamFalcon · January 7, 2018, 4:23pm

@smth thanks. So, just re-read the docs, looks like i can basically do this as well?

simulate a batch of 32 by running each seq at a time and calling .step() only once

for ex_i in range(0, 32):
    # gradients accumulate every call to backward
    # by calling step after 32 backward calls we can simulate a batch of 32
    loss = someLoss(f(x), y)
    loss.backward()
optimizer.step()