I’m testing how suitable the models made available by torchvision are at, among other things, analyzing both images and audio (In regards to the audio, I first extract MFCC features from the audio clip, and turn said MFCC features into an image, as I saw some people doing it, and saying that apparently it’s somewhat common practice).
However, after the first iteration, some of the models weights turn into NaN and subsequently return NaN as predictions. (These weights were checked using this code:
for a in [6,8]: print(a, model_a.features[a].weight)
I’m currently using a pretrained AlexNet (despite eventually wanting to attempt this pipeline with other models such as VGGs, GoogleNets, etc.). Up until now, I was using an LR value of 0.001, but as a result of finding a topic here in the forums in which someone suggested that a cause for exploding gradients was a high LR value, I have since lowered it to 0.000001 which does not solve the problem.
Help would be greatly appreciated!
Thank you to everyone who reads this