Deeplab Large FOV version 2 Trained in Caffe but not on Pytorch

Yes. I am absolutely sure that the data ingestion is exactly same as in Caffe. For ensuring that the data ingestion is the same, I also trained a Pytorch model using Caffe-Deeplab model’s data-layer-output as input. Hence the input to both the networks were the same.

Another concern might be the initialization. To have as much similarity as possible, I even converted Caffe-Deeplab’s “init.caffemodel”(provided by Liang-Chieh Chen) to a pytorch compatible Ordered Dict for initialization.

To use ‘poly’ learning policy correctly, I am following the advice on this forum itself.
Parameters like batch-size, learning rate, weight-decay, momentum are also same.

I was hoping to know if anyone else has been able to get the same performance. Its a few hours training with an easy setup.