RuntimeError: Mobile optimized model cannot be inferenced on GPU

I have used mobile_optimizer.optimize_for_mobile() function to optimize my model and I wanted to benchmark it with other optimization techniques for faster inference but it gives me runtime error when I try to run it on GPU

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-c705c21babec> in <module>
      1 optimized_traced_model = mobile_optimizer.optimize_for_mobile(traced_model)
      2 
----> 3 get_ipython().run_line_magic('timeit', 'with torch.no_grad(): optimized_traced_model(example1)')

~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2324                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2325             with self.builtin_trap:
-> 2326                 result = fn(*args, **kwargs)
   2327             return result
   2328 

<decorator-gen-60> in timeit(self, line, cell, local_ns)

~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
   1161             for index in range(0, 10):
   1162                 number = 10 ** index
-> 1163                 time_number = timer.timeit(number)
   1164                 if time_number >= 0.2:
   1165                     break

~/anaconda3/envs/e2r/lib/python3.7/site-packages/IPython/core/magics/execution.py in timeit(self, number)
    167         gc.disable()
    168         try:
--> 169             timing = self.inner(it, self.timer)
    170         finally:
    171             if gcold:

<magic-timeit> in inner(_it, _timer)

~/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
    graph(%input, %weight, %bias, %stride:int[], %padding:int[], %dilation:int[], %groups:int):
        %output_min_max : None = prim::Constant()
        %packed_weight_bias = prepacked::conv2d_clamp_prepack(
                              ~~~~~~~~~ <--- HERE
            %weight, %bias, %stride, %padding, %dilation, %groups,
            %output_min_max, %output_min_max)
RuntimeError: Could not run 'prepacked::conv2d_clamp_prepack' with arguments from the 'CUDA' backend. 'prepacked::conv2d_clamp_prepack' is only available for these backends: [CPU].

Also, when I run it on CPU, the runtime of the optimized model is more than an order magnitude slower then the un-optimized model.

I don’t know which optimizations are applied in optimizer_for_mobile, but would assume that you would see a speedup on a mobile platform, i.e. not necessarily for an x86 architecture.
Did you deploy the model and profiled it on a mobile device?

No, I am specifically looking for mobile devices with a GPU like NVIDIA Jetson.

For inference use cases on the Jetson platform, you could check out the jetson-inference repository, which provides some utilities and examples.