How to convert darts to ONNX

Hi.
I want to convert darts/cnn’s model to TFlite, finaly.
First of all, I tried to convert it to ONNX by below code.

import torch
import torch.nn as nn
import genotypes
from model import NetworkCIFAR as Network

genotype = eval("genotypes.%s" % 'DARTS')
model = Network(36, 10, 20, True, genotype)
model.load_state_dict(torch.load('./weights.pt'))
model = model.cuda()

onnx_model_path = './darts_model.onnx'
dummy_input = torch.randn(8,3,32,32)
input_names = ['image_array']
output_names = ['category']
torch.onnx.export(model,dummy_input, onnx_model_path,
                  input_names=input_names, output_names=output_names)

However, it couldn’t convert.
Error is below.

 Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/__init__.py", line 168, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 69, in export
    use_external_data_format=use_external_data_format)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 488, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 334, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/onnx/utils.py", line 291, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 278, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/XXXX_darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 361, in forward
    self._force_outplace,
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 351, in wrapper
    out_vars, _ = _flatten(outs)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type NoneType

Does onnx.export not correspond to darts?
Could you please tell me how to fix it to convert to ONNX?

Finally, I am asking this same question in darts’s issue, sorry.

Thank you!

My environment

  • Ubuntu 16.04
  • Python 3.6.10
  • CUDA 9.0
  • Pytorch 0.3.1(to search model), 1.5.1(to convert to ONNX)

Based on the error message, it seems that an intermediate activation is None instead of a valid tensor.
Is your model working fine in PyTorch eager mode and JIT (without the ONNX export)?

Thanks for your reply.
Is it correct to understand PyTorch eager mode as normal mode?
If so, the output of the model(input) is below. I think it is correct.

>>> out = model(dummy_input)
>>> out
(tensor([[-0.0391, -0.0840,  0.1382, -0.0397,  0.0157, -0.0448, -0.0603, -0.0823,
          0.0025, -0.0009],
        [ 0.0257,  0.0031, -0.1880,  0.0768, -0.1047, -0.0392,  0.1393, -0.0419,
         -0.0437,  0.0032],
        [ 0.0090,  0.0097, -0.0768, -0.0383, -0.0220, -0.2048, -0.1315,  0.0117,
         -0.0538, -0.0613],
        [ 0.0438, -0.1284,  0.0325,  0.0441,  0.0736,  0.1941, -0.0407, -0.0634,
          0.1074, -0.0407],
        [ 0.0440,  0.0194,  0.0147,  0.0859,  0.2149, -0.0393,  0.1640,  0.0369,
         -0.1021, -0.1820],
        [-0.3142, -0.0726, -0.0694, -0.1064, -0.1595,  0.2461,  0.1174,  0.2102,
          0.1790,  0.2188],
        [-0.0522,  0.0327, -0.1626, -0.0955,  0.0625, -0.0061,  0.0662,  0.0667,
          0.1003,  0.0635],
        [ 0.1054, -0.0456,  0.0922,  0.0559,  0.1422, -0.1924, -0.2107, -0.0572,
         -0.0424, -0.1007]], device='cuda:0', grad_fn=<AddmmBackward>), tensor([[-0.0779, -0.4483,  0.5459, -0.4263,  0.3033, -0.0147,  0.1823,  0.2561,
          0.3321, -0.8131],
        [-0.1185, -0.1932,  0.2465,  0.3930, -0.0634,  0.2440, -0.0587, -0.5931,
          0.0938,  0.2163],
        [ 0.0699, -0.2207,  0.5958, -0.0778, -0.1024, -0.1841,  0.5211,  0.0760,
          0.2308,  0.1463],
        [ 0.1858, -0.0432,  0.3188, -0.0905,  0.1415, -0.6925,  0.1487, -0.2300,
          1.0883,  0.1186],
        [-0.1471,  0.1120,  0.3354, -0.3918,  0.0748, -0.5318,  0.0106, -0.4543,
          1.2513,  0.1778],
        [ 0.4499,  0.0425,  0.3949, -0.8790, -0.1463, -0.4942, -0.4362, -0.3380,
          0.3257,  0.2104],
        [ 0.2962,  0.0098,  0.6569,  0.0520,  0.1627, -0.4044,  0.2104, -0.2278,
          0.2411,  0.0337],
        [ 0.0591, -0.0795,  0.9120, -0.5483, -0.2887, -0.2304, -0.3799, -0.5769,
          0.5903, -0.6071]], device='cuda:0', grad_fn=<AddmmBackward>))

Regarding the JIT, I don’t know much about it, so could you please tell me how to check that?
Here’s the code, is it possible to convert to ONNX even if ‘foward’ has ‘if’ in it?

class NetworkCIFAR(nn.Module):

  def __init__(self, C, num_classes, layers, auxiliary, genotype):
    super(NetworkCIFAR, self).__init__()
    self._layers = layers
    self._auxiliary = auxiliary

    stem_multiplier = 3 #RGB
    C_curr = stem_multiplier*C #C is input_channels
    self.stem = nn.Sequential(
      nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
      nn.BatchNorm2d(C_curr)
    )
    
    C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
    self.cells = nn.ModuleList()
    reduction_prev = False
    for i in range(layers):
      if i in [layers//3, 2*layers//3]:
        C_curr *= 2
        reduction = True
      else:
        reduction = False
      cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, reduction_prev)
      reduction_prev = reduction
      self.cells += [cell]
      C_prev_prev, C_prev = C_prev, cell.multiplier*C_curr
      if i == 2*layers//3:
        C_to_auxiliary = C_prev

    if auxiliary:
      self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes)
    self.global_pooling = nn.AdaptiveAvgPool2d(1)
    self.classifier = nn.Linear(C_prev, num_classes)

  def forward(self, input):
    logits_aux = None
    s0 = s1 = self.stem(input)
    for i, cell in enumerate(self.cells):
      s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
      if i == 2*self._layers//3:
        if self._auxiliary and self.training:
          logits_aux = self.auxiliary_head(s1)
    out = self.global_pooling(s1)
    logits = self.classifier(out.view(out.size(0),-1))
    return logits, logits_aux

I’m sorry for all of the questions.
Thank you for your time.

Yes, sorry for the unclear naming. By “eager” mode I meant the normal Python usage.

Good to see the model is working generally.
Could you , for the sake of debugging, remove the logits_aux from the forward and just return the logits and retry to export the model?

It looks like it fails in tracing, can you try torch.jit.trace and see if it work or not?

I’m sorry for my late reply.
After removing logits_aux as you advised, it worked!!!
I cannot thank you enough!

However, why it couldn’t convert with logits_aux??

cclass AuxiliaryHeadCIFAR(nn.Module):

  def __init__(self, C, num_classes):
    """assuming input size 8x8"""
    super(AuxiliaryHeadCIFAR, self).__init__()
    self.features = nn.Sequential(
      nn.ReLU(inplace=True),
      nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False), # image size = 2 x 2
      nn.Conv2d(C, 128, 1, bias=False),
      nn.BatchNorm2d(128),
      nn.ReLU(inplace=True),
      nn.Conv2d(128, 768, 2, bias=False),
      nn.BatchNorm2d(768),
      nn.ReLU(inplace=True)
    )
    self.classifier = nn.Linear(768, num_classes)

  def forward(self, x):
    x = self.features(x)
    x = self.classifier(x.view(x.size(0),-1))
    return x

Thank you for your reply, and sorry for my late reply.

Yes, torch.jit.trace doesn’t work.

>>> torch.jit.trace(model,dummy_input)
/home/XXXX/darts/cnn/eval-EXP-20200710-150423/utils.py:105: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  mask = Variable(torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 875, in trace
    check_tolerance, _force_outplace, _module_class)
  File "/home/XXXX/darts/cnn/eval-EXP-20200710-150423/venv/lib/python3.6/site-packages/torch/jit/__init__.py", line 1027, in trace_module
    module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
RuntimeError: Only tensors, lists and tuples of tensors can be output from traced functions

If I remove logits_aux following ptrblck’s advice, it work well.
I don’t know why. So, I would appreciate it if you could inform when you find it.
Thank you!

My best guess is that tracing the model didn’t go through the conditions where aux_logits is set to a tensor, so that it stayed None until the return statement. This could happen, e.g. if you called model.eval() before exporting it.

Thank you for your comment!
The point is that I have no choice but to erase the lights_aux, tracing model.