I have used mobile_optimizer.optimize_for_mobile()
function to optimize my model and I wanted to benchmark it with other optimization techniques for faster inference but it gives me runtime error when I try to run it on GPU
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-15-c705c21babec> in <module>
1 optimized_traced_model = mobile_optimizer.optimize_for_mobile(traced_model)
2
----> 3 get_ipython().run_line_magic('timeit', 'with torch.no_grad(): optimized_traced_model(example1)')
~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2324 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2325 with self.builtin_trap:
-> 2326 result = fn(*args, **kwargs)
2327 return result
2328
<decorator-gen-60> in timeit(self, line, cell, local_ns)
~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
1161 for index in range(0, 10):
1162 number = 10 ** index
-> 1163 time_number = timer.timeit(number)
1164 if time_number >= 0.2:
1165 break
~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/magics/execution.py in timeit(self, number)
167 gc.disable()
168 try:
--> 169 timing = self.inner(it, self.timer)
170 finally:
171 if gcold:
<magic-timeit> in inner(_it, _timer)
~/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
graph(%input, %weight, %bias, %stride:int[], %padding:int[], %dilation:int[], %groups:int):
%output_min_max : None = prim::Constant()
%packed_weight_bias = prepacked::conv2d_clamp_prepack(
~~~~~~~~~ <--- HERE
%weight, %bias, %stride, %padding, %dilation, %groups,
%output_min_max, %output_min_max)
RuntimeError: Could not run 'prepacked::conv2d_clamp_prepack' with arguments from the 'CUDA' backend. 'prepacked::conv2d_clamp_prepack' is only available for these backends: [CPU].
Also, when I run it on CPU, the runtime of the optimized model is more than an order magnitude slower then the un-optimized model.