How do you specify weight decay for all layers (like what Caffe does) in Pytorch?

I’m trying to port an architecture to Pytorch. The architecture by itself is very simple, however, I can not replicate the results that happen in Caffe. my suspicion is it has something to do with weight decay.
How do you specify weight decay for all layers ? So far I have been using the weight decay parameter for Adadelat optimizer but this is not correct and I guess its the prime reason I am not able to get as good result as I get in Caffe
Any help is greatly appreciated.

1 Like