Why is the output of my model going to zero after a few hundred steps? Is this exploding gradients?

Alio · December 5, 2022, 4:35pm

After a few hundred steps the output of the model(s) go to zero. What are the possible causes? Is it exploding gradients?

My algorithm is more than 1000 lines of code so don’t know if it would be welcome here and whether I should focus on one problem. Might it be exploding gradients?

I have tried to clip the gradients in several ways. I don’t know if any of them work because there seems to be no difference in the ‘zeroing’ problem whether any of these attempts at clipping the gradient in the code or commented out.

        loss_l.backward();
        torch::nn::utils::clip_grad_norm_(actor.parameters(), 0.4);

        // torch::nn::utils::clip_grad_norm_(actor_optimizer.parameters(), 0.1);
        // for (auto& param : actor.parameters()) {
        //     torch::nn::utils::clip_grad_norm_(param, 0.1);
        // }
        
        // torch::nn::utils::clip_grad_norm_(actor.parameters(), 0.5);

        // for (auto& param_group : actor_optimizer.param_groups()) {
        //     for (auto& param : param_group.params()) {
        //         torch::nn::utils::clip_grad_norm_(param, 0.03);
        //     }
        // }

        // for (auto& param : actor.parameters()) {
        //     torch::nn::utils::clip_grad_norm_(param, 1.1);
        // }

        actor_optimizer.step();

ptrblck · December 6, 2022, 6:41am

It’s hard to tell if your gradients are exploding, so I would recommend to check the norm or (some) gradients to see if their magnitude increases during the training.