I’m trying to port an architecture to Pytorch. The architecture by itself is very simple, however, I can not replicate the results that happen in Caffe. my suspicion is it has something to do with weight decay.
How do you specify weight decay for all layers ? So far I have been using the weight decay parameter for Adadelat optimizer but this is not correct and I guess its the prime reason I am not able to get as good result as I get in Caffe
Any help is greatly appreciated.
1 Like