Hi, in my training, I found that batches containing some specific data samples constantly getting very large gradients. However, since I have too many training samples, I cannot figure out which samples in the batch are getting these bad gradients.
Is there any way in PyTorch to print out the norm of an intermediate feature for each data sample in a batch? i.e. if I have:
data >> fc_layer1 >> fc1_feature >> fc_layer2 >> fc2_feature >> softmaxloss_layer >> loss, I want to print norm( d(loss)/d(fc1_feature) ) for every data sample.
Thank you very much for the help!