Hey there,
I am currently trying to build a RNN to detect certain events in a video input stream.
Let’s say the RNN rolls out over a given input sequence and produces an output for each timestep t, but during training I want to backpropagate only after each 4th timestep t_i+3.
Do I have to sum up all the losses t_i, t_i+1, t_i+2 and t_i+3 and run backprop on this sum or do I only have to run backprop on the loss of timestep t_i+3?

My understanding is, when I run backprop on the sum of the losses from t_i to t_4 it is like running backprop after each timestep, is this assumption correct?

Here an image, the pink backpropagation is what I would like to achieve.

You’re correct. You can just extract whatever portion of the output you want to run backprop on and use that in the calculation of your loss function. This is very common in many to one RNNs where a series of inputs are used to predict one outcome.

@aplassard Thanks! Just to be clear: So hypothetically, if I want to run backpropagation after each timestep t_i I have to sum up all the errors of the t_0 to t_n with torch.sum() and it is like running backprop after each timestep t_i?

One thing, I am using Cross Entropy Loss, which I give the output of my RNN of size (batch, seq_length, feature_size) and it tells me it only would like to have (batch, feature_size). Will there be a problem with backprop if I do change my outputs with tensor.view() to have the size (batch, seq_length * feature_size)?

EDIT: Nvm, changed Loss to Binary Cross Entropy, should work that way.