Hi Mona!
I assume that your use case is to pass a batch of tensors of different sizes
through your model, calculate a loss, backpropagate, and then optimize
your model’s parameters.
You will not be able to pass your batch of tensors through your model as
a single batch (unless you zero-fill or sample, etc.) because pytorch does
not support such “ragged” tensors.
So you will have to pass your individual tensors through your model one by
by one* (probably giving them a leading singleton batch dimension), at the
possible detriment of not making fully efficient use of your gpu (and / or cpu)
pipelines.
You could either loop over the individual samples in your batch, passing
them through your model, and accumulate the per-sample losses
together into a batch_loss
and then backpropagate once by calling
batch_loss.backward()
, and then call opt.step()
.
Or you could call loss.backward()
for each sample separately (.backward()
accumulates the gradients until you call .zero_grad()
) and then call
opt.step()
, which will act on the accumulated gradients.
Or you could backpropagate and call opt.step()
for each individual sample
in the “batch.” (The extent to which batches of samples help the optimization
process is a nuanced question, but batches do help keep your gpu pipelines
full.)
In general, I would recommend the first approach (where you loop over
the samples for the forward passes, but then backpropagate and optimize
just once per batch).
*) If some of the samples in your training set have the same size – let’s
say that you can find four samples in your training set that all have shape
[200, 512]
– you could package like-sized samples together into non-ragged
batch tensors – in this example case, a batch of shape [4, 200, 512]
– and
pass them as single tensors through your model. You might not get the full
benefit of having a batch size of 64, but you will still likely make better use
of your gpu than if you had passed the four samples through your model
separately.
Best.
K. Frank