I tried to implement the action detection model using LSTM follow this figure

where x is the sequence feature extracted from a CNN model, in this case, I used Resnet50, and y is a class predicted in each time step.

I use batch size 1 and Lr is 0.00001, and the target of each sequence looks like

```
[32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36]
```

I computed loss function in every time step using Cross-Entropy and average them.

I found the loss graph looks unusual.

do you have any idea what happened?

and I’m not clear about my loss function computation.

what is the correct way to compute loss in this case?

is it possible that the problem come from loss function computation?

thank you,