I’ve been experimenting with tensor decomposition in VGG16 model based on this repo.
What I’m trying to do is split conv layer into smaller parts, apply tucker decomposition in each of them, and put them back to conv layer as in baseline model.
In the baseline model, it takes each Conv2d layer from VGG model and break them into 3 smaller Conv2d layers.
I printed model parameters before and after training. It was fine before training.
After the training, loss and parameters became nan.
You can check out my code in this colab.
Any suggestions would be much appreciated.