Converting torchvision Resnet to caffe2

Hello, I am trying to use the trained torchvision Resnet in caffe2. I’ve renamed all layers and weights as necessary to load the weights and gotten pretty good results, although the accuracy in caffe2 is just lower than that of pytorch. I’ve found that the first conv gives the same results in either framework, but the first batch norm does not.

How does pytorch’s “running_var” correspond to caffe2’s “_riv” running inverse variance? I think this is the problem. I’ve tried copying the same weight as well as taking its reciprocal (is that what inverse is supposed to mean?).