[Caffe2] MobileNetV2 Quantized using caffe2: BlobIsTensorType(*blob, CPU). Blob is not a CPU Tensor: 325

TerryTsao · November 8, 2018, 7:28am

I’ve been trying to run MobileNetV2 Quantized on devices with ARM CPUs. It keeps showing the following error messages:

RuntimeError: [enforce fail at predictor.cc:13] BlobIsTensorType(*blob, CPU). Blob is not a CPU Tensor: 325
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x78 (0x7fa0459da0 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libc10.so)
frame #1: <unknown function> + 0x10edb6c (0x7fa155bb6c in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libcaffe2.so)
frame #2: <unknown function> + 0x10ee560 (0x7fa155c560 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libcaffe2.so)
frame #3: caffe2::Predictor::operator()(std::vector<caffe2::Tensor, std::allocator<caffe2::Tensor> > const&, std::vector<caffe2::Tensor, std::allocator<caffe2::Tensor> >*) + 0x270 (0x7fa155d540 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/lib/libcaffe2.so)
frame #4: <unknown function> + 0x4db14 (0x7fa23afb14 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/caffe2/python/caffe2_pybind11_state.cpython-35m-aarch64-linux-gnu.so)
frame #5: <unknown function> + 0x8f264 (0x7fa23f1264 in /media/nvidia/093a3e14-f70d-4f2d-93fa-3bf25a2fcc17/nvidia/pytorch/build/caffe2/python/caffe2_pybind11_state.cpython-35m-aarch64-linux-gnu.so)
<omitting python frames>

I’ve tried this on ARMv8 (Jetson TX2) and ARMv7 (Raspberry pi 2B, I think).

Here’s the step I took:

clone the pytorch repo, master branch, submodule init (very recently, so QNNPack is there, both in caffe2 and third_party directories
run script to build caffe2
acquire init_net.pb & predict_net.pb
run according to the tutorial

# On TX2
$ /path/to/pytorch/scripts/build_tegra_x1.sh

# On Rasp pi
$ /path/to/pytorch/scripts/build_raspbian.sh

Here’s my code sample to run the inference:

import numpy as np                                                                                                                                       
from caffe2.python import core, workspace

def main():                                                                                                                                              
    with open('/home/nvidia/mnv2/init_net.pb', 'rb') as f:                                                                                               
        init_net = f.read()                                                                                                                              
    with open('/home/nvidia/mnv2/predict_net.pb', 'rb') as f:                                                                                            
        predict_net = f.read()                                                                                                                           
                                                                                                                                                         
    p = workspace.Predictor(init_net, predict_net)                                                                                                       
    img = np.random.randn(1, 3, 224, 224)   # actual code is reading ImageNet, resizing and stuff                                                                                         
    results = p.run([img])                                                                                                                               
                                                                                                                                                         
                                                                                                                                                         
if '__main__' == __name__:                                                                                                                               
    main()

Note: with TX2, I’ve tried build with/without CUDA. The same error occurs on all occasions.

I have also done some digging on my own. I know for sure qnnp_create_convolution2d_nhwc_q8 was called. Doesn’t seem like issue with QNNPack to me, thus.

The error comes from file caffe2/predictor/predictor.cc, in whichever operator(), when calling exportOutputTensor(), which triggers the CAFFE_ENFORCE fail on line 13. Seems to me the inference is almost successful, right before the output is exported.

Also, I’ve tried the tutorial. With squeezenet, everything is fine. Merely changing the pb files breaks things.

Is there some reason for this failure? I’m wondering if I didn’t compile QNNPack related stuff correctly, given that this is my first time with Caffe2? Also, why is the output name “325” in the error message??

Thank you!

javier.a.velasco.z · February 12, 2019, 2:19am

If you print your graph in a human-readable form using print(onnx.helper.printable_graph(model.graph)) you’ll probably realize that the blob 325 is a Dropout module of the model. I’m facing the same problem with MobileNetV2 exported from PyTorch to Caffe2 using ONNX. If I omit (remove) the Dropout module when building the model in PyTorch and export it again, there’s no error when executing using predictor.

I don’t know if someone has a solution for this. Meanwhile you can try executing the model with Dropout as follows:

init_def = caffe2_pb2.NetDef()
with open("init_net.pb", "rb") as f:
    init_def.ParseFromString(f.read())
    workspace.RunNetOnce(init_def.SerializeToString())

predict_def = caffe2_pb2.NetDef()
with open("predict_net.pb", "rb") as f:
    predict_def.ParseFromString(f.read())
    workspace.CreateNet(predict_def.SerializeToString())

print ('Running net...')

workspace.FeedBlob('0', inputArray) #Feed the inputArray into the input blob (0) of the network
workspace.RunNetOnce(predict_def)

img_out = workspace.FetchBlob("468") #Fetch the result from the output blob (468) of the network
print(img_out)

Of course, you have to replace the id number of each blob in FeedBlob and FetchBlob, but you can obtain them easily after printing your graph in human-readable form.

I hope it helps. I’m currently still looking for a way to execute it using predictor, because I need to run it in an Android app and FeedBlob doesn’t seem to be usable in there.

TerryTsao · February 12, 2019, 3:26am

Hi, I’m the OP. Thank you for your reply. This has been resolved. Check this out if it helps.

The problem with the official model is that 325 refers to an Int8Tensor, which cannot be the output of a network for some reason I’m not familiar with.

Also, in case you are using any pytorch version not containing this PR from me (https://github.com/pytorch/pytorch/pull/15047), you might experience a bug related to fetching Int8Tensor. It seems the fix is not included in the latest v1.0.1 yet. So if you happens to encounter some weird issue, there’s a simple fix.

HTH.

javier.a.velasco.z · February 12, 2019, 5:57pm

Hi Terry. Thanks for your answer. I’ll try and see if it solves my problem in Python. Nevertheless, I wonder if you know how to fix it in Caffe2 for Android Studio as it is my real objective platform and where I’m having that problem too.

I don’t really know if there’s any way to get the CPP and .a files of Caffe2 for Android Studio by building using your PR. I still don’t quite know how the Caffe2/Android Studio linking works.

I hope you can help me.

Thanks.

TerryTsao · February 13, 2019, 1:14am

I don’t think it requires my PR to run a network successfully. Fetching an Int8Tensor is more of a development convenience to check intermediate results.

I’m not familiar with Android. Also, ONNX screws me up so many times that I ended up coding a tool from scratch. So I’m afraid I won’t be much of a help.

However, if you are okay with putting up some error msgs, I might be able to help if I’ve seen them before.

javier.a.velasco.z · February 14, 2019, 4:23pm

The error is the same you posted (Blob is not a CPU Tensor), it refers to the blob just before the output blob and it is a Dropout module in my network. If I remove the Dropout module from the original network architecture, the error is gone and everything executes well. I’ve seen reports of the same behavior before: https://github.com/facebookarchive/caffe2/issues/2165.

Thanks.

TerryTsao · February 15, 2019, 12:59am

Well, if there are several reports about it, I’m guessing ONNX is doing sth not correctly. I’m not familiar with ONNX, so I can’t say for sure. Like I said, I no longer use that for conversion.

Btw, have u tried deserializing the predict_net.pb file? That’s how I discovered the error. Like I said in that issue, the official model is not giving the correct blob as output. I don’t know if ONNX is behind this crime, though.

Just print the NetDef will be fine.

Here’s problem with the original model:

...
op {
  type: "Int8Softmax"
  top: "325"
  ...
}
op {
  type: "Dequantize"
  bottom: "325"
  top: "network_output"
  ...
}
...
external_output: "325"

which should be external_output: "network_output" at the end.

See if your problem is similar to this one. HTH.