It’s a task that has a label for each time_steps for a batch and I have to calculate the loss for that label. I previously asked about the case that the action recognition task is labeled per frame instead of per video, but I didn’t get any answers. If anyone knows a toy task related to the above, please let me know.
What do you exactly mean by a toy task? a simple toy example where output is of shape
[batch, time_steps, output_dim] is language modeling. Check this link for a simple example.