Hi there. For saving the GPU memory, I use FP16 in my work just like nnUNet did, they defined a GradScaler() for updating the gradients and stuff. But I modified the code for my own usage:
For updating loss and optimizer for twice.
Compared with the original update order like optimizer1.step(), and optimizer2.step() followed.What’s the difference? Can it work normally as we want?
I found your code is work for me, too, when I tried it in the console. But can’t still fix the AssertionError, and I know that it seems work for GradScaler().
But reported for the other bug not related with it. Now I n-checked my code and I got what’s wrong and no errors report.
It sounds dumb , for the mean-teacher framework, I am training my work with loading weigths for teacher model from trained model, and I forgot to remove the teacher model from the part of torch.no_grad(), so there is not any gradients feed back, might be why this AssertionError cause.
After fixed this, no errors reported anymore.