I’ve been trying to run MobileNetV2 Quantized on devices with ARM CPUs. It keeps showing the following error messages:
RuntimeError: [enforce fail at predictor.cc:13] BlobIsTensorType(*blob, CPU). Blob is not a CPU Tensor: 325
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x78 (0x7fa0459da0 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libc10.so)
frame #1: <unknown function> + 0x10edb6c (0x7fa155bb6c in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libcaffe2.so)
frame #2: <unknown function> + 0x10ee560 (0x7fa155c560 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libcaffe2.so)
frame #3: caffe2::Predictor::operator()(std::vector<caffe2::Tensor, std::allocator<caffe2::Tensor> > const&, std::vector<caffe2::Tensor, std::allocator<caffe2::Tensor> >*) + 0x270 (0x7fa155d540 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libcaffe2.so)
frame #4: <unknown function> + 0x4db14 (0x7fa23afb14 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/caffe2/python/caffe2_pybind11_state.cpython-35m-aarch64-linux-gnu.so)
frame #5: <unknown function> + 0x8f264 (0x7fa23f1264 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/caffe2/python/caffe2_pybind11_state.cpython-35m-aarch64-linux-gnu.so)
<omitting python frames>
I’ve tried this on ARMv8 (Jetson TX2) and ARMv7 (Raspberry pi 2B, I think).
Here’s the step I took:
- clone the pytorch repo, master branch, submodule init (very recently, so QNNPack is there, both in caffe2 and third_party directories
- run script to build caffe2
- acquire
init_net.pb
&predict_net.pb
- run according to the tutorial
# On TX2
$ /path/to/pytorch/scripts/build_tegra_x1.sh
# On Rasp pi
$ /path/to/pytorch/scripts/build_raspbian.sh
Here’s my code sample to run the inference:
import numpy as np
from caffe2.python import core, workspace
def main():
with open('/home/nvidia/mnv2/init_net.pb', 'rb') as f:
init_net = f.read()
with open('/home/nvidia/mnv2/predict_net.pb', 'rb') as f:
predict_net = f.read()
p = workspace.Predictor(init_net, predict_net)
img = np.random.randn(1, 3, 224, 224) # actual code is reading ImageNet, resizing and stuff
results = p.run([img])
if '__main__' == __name__:
main()
Note: with TX2, I’ve tried build with/without CUDA. The same error occurs on all occasions.
I have also done some digging on my own. I know for sure qnnp_create_convolution2d_nhwc_q8
was called. Doesn’t seem like issue with QNNPack to me, thus.
The error comes from file caffe2/predictor/predictor.cc
, in whichever operator()
, when calling exportOutputTensor()
, which triggers the CAFFE_ENFORCE
fail on line 13. Seems to me the inference is almost successful, right before the output is exported.
Also, I’ve tried the tutorial. With squeezenet, everything is fine. Merely changing the pb files breaks things.
Is there some reason for this failure? I’m wondering if I didn’t compile QNNPack related stuff correctly, given that this is my first time with Caffe2? Also, why is the output name “325” in the error message??
Thank you!