How to convert darts to ONNX

crook52 · July 17, 2020, 9:38am

Hi.
I want to convert darts/cnn’s model to TFlite, finaly.
First of all, I tried to convert it to ONNX by below code.

import torch
import torch.nn as nn
import genotypes
from model import NetworkCIFAR as Network

genotype = eval("genotypes.%s" % 'DARTS')
model = Network(36, 10, 20, True, genotype)
model.load_state_dict(torch.load('./weights.pt'))
model = model.cuda()

onnx_model_path = './darts_model.onnx'
dummy_input = torch.randn(8,3,32,32)
input_names = ['image_array']
output_names = ['category']
torch.onnx.export(model,dummy_input, onnx_model_path,
                  input_names=input_names, output_names=output_names)

However, it couldn’t convert.
Error is below.

 Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/__init__.py", line 168, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 69, in export
    use_external_data_format=use_external_data_format)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 488, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 334, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 291, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 278, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/XXXX_darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 361, in forward
    self._force_outplace,
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 351, in wrapper
    out_vars, _ = _flatten(outs)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type NoneType

Does onnx.export not correspond to darts?
Could you please tell me how to fix it to convert to ONNX?

Finally, I am asking this same question in darts’s issue, sorry.

Thank you!

My environment

Ubuntu 16.04
Python 3.6.10
CUDA 9.0
Pytorch 0.3.1(to search model), 1.5.1(to convert to ONNX)

ptrblck · July 19, 2020, 8:23am

Based on the error message, it seems that an intermediate activation is None instead of a valid tensor.
Is your model working fine in PyTorch eager mode and JIT (without the ONNX export)?

crook52 · July 20, 2020, 8:02am

Thanks for your reply.
Is it correct to understand PyTorch eager mode as normal mode?
If so, the output of the model(input) is below. I think it is correct.

>>> out = model(dummy_input)
>>> out
(tensor([[-0.0391, -0.0840,  0.1382, -0.0397,  0.0157, -0.0448, -0.0603, -0.0823,
          0.0025, -0.0009],
        [ 0.0257,  0.0031, -0.1880,  0.0768, -0.1047, -0.0392,  0.1393, -0.0419,
         -0.0437,  0.0032],
        [ 0.0090,  0.0097, -0.0768, -0.0383, -0.0220, -0.2048, -0.1315,  0.0117,
         -0.0538, -0.0613],
        [ 0.0438, -0.1284,  0.0325,  0.0441,  0.0736,  0.1941, -0.0407, -0.0634,
          0.1074, -0.0407],
        [ 0.0440,  0.0194,  0.0147,  0.0859,  0.2149, -0.0393,  0.1640,  0.0369,
         -0.1021, -0.1820],
        [-0.3142, -0.0726, -0.0694, -0.1064, -0.1595,  0.2461,  0.1174,  0.2102,
          0.1790,  0.2188],
        [-0.0522,  0.0327, -0.1626, -0.0955,  0.0625, -0.0061,  0.0662,  0.0667,
          0.1003,  0.0635],
        [ 0.1054, -0.0456,  0.0922,  0.0559,  0.1422, -0.1924, -0.2107, -0.0572,
         -0.0424, -0.1007]], device='cuda:0', grad_fn=<AddmmBackward>), tensor([[-0.0779, -0.4483,  0.5459, -0.4263,  0.3033, -0.0147,  0.1823,  0.2561,
          0.3321, -0.8131],
        [-0.1185, -0.1932,  0.2465,  0.3930, -0.0634,  0.2440, -0.0587, -0.5931,
          0.0938,  0.2163],
        [ 0.0699, -0.2207,  0.5958, -0.0778, -0.1024, -0.1841,  0.5211,  0.0760,
          0.2308,  0.1463],
        [ 0.1858, -0.0432,  0.3188, -0.0905,  0.1415, -0.6925,  0.1487, -0.2300,
          1.0883,  0.1186],
        [-0.1471,  0.1120,  0.3354, -0.3918,  0.0748, -0.5318,  0.0106, -0.4543,
          1.2513,  0.1778],
        [ 0.4499,  0.0425,  0.3949, -0.8790, -0.1463, -0.4942, -0.4362, -0.3380,
          0.3257,  0.2104],
        [ 0.2962,  0.0098,  0.6569,  0.0520,  0.1627, -0.4044,  0.2104, -0.2278,
          0.2411,  0.0337],
        [ 0.0591, -0.0795,  0.9120, -0.5483, -0.2887, -0.2304, -0.3799, -0.5769,
          0.5903, -0.6071]], device='cuda:0', grad_fn=<AddmmBackward>))

Regarding the JIT, I don’t know much about it, so could you please tell me how to check that?
Here’s the code, is it possible to convert to ONNX even if ‘foward’ has ‘if’ in it?

class NetworkCIFAR(nn.Module):

  def __init__(self, C, num_classes, layers, auxiliary, genotype):
    super(NetworkCIFAR, self).__init__()
    self._layers = layers
    self._auxiliary = auxiliary

    stem_multiplier = 3 #RGB
    C_curr = stem_multiplier*C #C is input_channels
    self.stem = nn.Sequential(
      nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
      nn.BatchNorm2d(C_curr)
    )
    
    C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
    self.cells = nn.ModuleList()
    reduction_prev = False
    for i in range(layers):
      if i in [layers//3, 2*layers//3]:
        C_curr *= 2
        reduction = True
      else:
        reduction = False
      cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, reduction_prev)
      reduction_prev = reduction
      self.cells += [cell]
      C_prev_prev, C_prev = C_prev, cell.multiplier*C_curr
      if i == 2*layers//3:
        C_to_auxiliary = C_prev

    if auxiliary:
      self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes)
    self.global_pooling = nn.AdaptiveAvgPool2d(1)
    self.classifier = nn.Linear(C_prev, num_classes)

  def forward(self, input):
    logits_aux = None
    s0 = s1 = self.stem(input)
    for i, cell in enumerate(self.cells):
      s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
      if i == 2*self._layers//3:
        if self._auxiliary and self.training:
          logits_aux = self.auxiliary_head(s1)
    out = self.global_pooling(s1)
    logits = self.classifier(out.view(out.size(0),-1))
    return logits, logits_aux

I’m sorry for all of the questions.
Thank you for your time.

ptrblck · July 21, 2020, 3:49am

Yes, sorry for the unclear naming. By “eager” mode I meant the normal Python usage.

Good to see the model is working generally.
Could you , for the sake of debugging, remove the logits_aux from the forward and just return the logits and retry to export the model?

ebarsoum · July 21, 2020, 4:37am

It looks like it fails in tracing, can you try torch.jit.trace and see if it work or not?

crook52 · July 28, 2020, 6:20am

I’m sorry for my late reply.
After removing logits_aux as you advised, it worked!!!
I cannot thank you enough!

However, why it couldn’t convert with logits_aux??

cclass AuxiliaryHeadCIFAR(nn.Module):

  def __init__(self, C, num_classes):
    """assuming input size 8x8"""
    super(AuxiliaryHeadCIFAR, self).__init__()
    self.features = nn.Sequential(
      nn.ReLU(inplace=True),
      nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False), # image size = 2 x 2
      nn.Conv2d(C, 128, 1, bias=False),
      nn.BatchNorm2d(128),
      nn.ReLU(inplace=True),
      nn.Conv2d(128, 768, 2, bias=False),
      nn.BatchNorm2d(768),
      nn.ReLU(inplace=True)
    )
    self.classifier = nn.Linear(768, num_classes)

  def forward(self, x):
    x = self.features(x)
    x = self.classifier(x.view(x.size(0),-1))
    return x

crook52 · July 28, 2020, 6:34am

Thank you for your reply, and sorry for my late reply.

Yes, torch.jit.trace doesn’t work.

>>> torch.jit.trace(model,dummy_input)
/home/XXXX/darts/cnn/eval-EXP-20200710-150423/utils.py:105: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  mask = Variable(torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 875, in trace
    check_tolerance, _force_outplace, _module_class)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 1027, in trace_module
    module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
RuntimeError: Only tensors, lists and tuples of tensors can be output from traced functions

If I remove logits_aux following ptrblck’s advice, it work well.
I don’t know why. So, I would appreciate it if you could inform when you find it.
Thank you!

ptrblck · July 28, 2020, 9:10am

My best guess is that tracing the model didn’t go through the conditions where aux_logits is set to a tensor, so that it stayed None until the return statement. This could happen, e.g. if you called model.eval() before exporting it.

crook52 · July 30, 2020, 1:20am

Thank you for your comment!
The point is that I have no choice but to erase the lights_aux, tracing model.