Pytorch vs Caffe2 performance difference

I am trying to deploy the Resnet50 from torchvision using caffe2. However, I notice some differences when I run it within pytorch and caffe2:

  • in contrast to what I would expect, the caffe2 model is slower
  • the caffe2 model uses 6 times more memory than then pytorch variant

I have uploaded a full notebook here: https://gist.github.com/dseuss/cfd016652d4c3805c066228985b88c2c

I am stumbling a bit over the following warning I get even though I have checked that n_np.dtype == np.float32

CUDA operators do not support 64-bit doubles, please use arr.astype(np.float32) or np.int32 for ints. Blob: input_0 type: float64