Can't match Caffe results in PyTorch re-implementation

marcman411 · September 11, 2018, 3:52pm

I’ve reimplemented a Caffe network in PyTorch. I am training with identical data splits, augmentation, loss weights, and learning parameters. Yet while I can get decent results, my network is still not nearly as good as the original Caffe model. The only way I can get somewhat close is by using a learning rate decay–which they don’t use in the original paper. Instead they only use Adam with a weight decay. When I do the same, my results are meh.

What could I be missing here?