Torch.onnx.export of PyTorch model is slow - expected completion time?

sevagh · July 28, 2021, 11:22pm

Hello,

I’m trying to speed up my model inference. It’s a PyTorch module, pretty standard - no special ops, just PyTorch convolution layers.

The export code is copied from this tutorial (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime — PyTorch Tutorials 1.9.0+cu102 documentation :

if __name__ == '__main__':
    model_str_or_path = sys.argv[1]
    model_path = Path(model_str_or_path).expanduser()

    # when path exists, we assume its a custom model saved locally
    if model_path.exists():
        with open(Path(model_path, "separator.json"), "r") as stream:
            enc_conf = json.load(stream)

        xumx_model, model_nsgt, jagged_slicq_sample = load_target_models(
            model_str_or_path=model_path, pretrained=True, sample_rate=enc_conf["sample_rate"], device="cpu"
        )

    xumx_model.eval()

    # Input to the model
    torch_out = xumx_model(jagged_slicq_sample)

    # Export the model
    torch.onnx.export(xumx_model,                # model being run
                      jagged_slicq_sample,       # model input (or a tuple for multiple inputs)
                      "xumx_slicq.onnx",         # where to save the model (can be a file or file-like object)
                      export_params=True,        # store the trained parameter weights inside the model file
                      opset_version=11,          # the ONNX version to export the model to
                      do_constant_folding=True,  # whether to execute constant folding for optimization
                      input_names = ['input'],   # the model's input names
                      output_names = ['output'], # the model's output names
                      dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                    'output' : {0 : 'batch_size'}})

My model has ~6.7 million parameters (torchinfo output):

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
OpenUnmix                                     --                        --
Total params: 6,669,912
Trainable params: 6,669,912
Non-trainable params: 0
Total mult-adds (G): 194.27
===============================================================================================
Input size (MB): 28.63
Forward/backward pass size (MB): 9359.33
Params size (MB): 26.68
Estimated Total Size (MB): 9414.64

The trained size for the pth weights file is 28M.

My export command has been running for 3 hours, and has not completed yet. I’m wondering if that’s an expected export time for my size of model.

Thanks.

ptrblck · July 29, 2021, 6:25am

No, I guess your script might hang, as 3h are not expected.
In case the process is still running, you could try to attach gdb to it and check the backtrace to see where it’s hanging.

sevagh · July 29, 2021, 11:45am

It actually ended up working. I used strace to see if it was hung or progressing, and could see some activity:

[pid 238336] munmap(0x7fb9bb300000, 262144) = 0
[pid 238336] munmap(0x7fb9bb7c0000, 262144) = 0
[pid 238336] munmap(0x7fb9bb800000, 262144) = 0
[pid 238336] munmap(0x7fb9bb700000, 262144) = 0
[pid 238336] munmap(0x7fb9bb380000, 262144) = 0
[pid 238336] munmap(0x7fb9bb180000, 262144) = 0
[pid 238336] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9bb8c0000
[pid 238336] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9bb880000
[pid 238336] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9bb800000
[pid 238336] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9bb7c0000

The whole thing ended up taking 8 (!) hours:

(umx-gpu-numba) sevagh:xumx-sliCQ $ ls -latrh xumx_slicq.onnx
-rw-r--r--. 1 sevagh sevagh 39M Jul 29 04:28 xumx_slicq.onnx

The verbose output shows a lot of info:

%247701 : Long(device=cpu) = onnx::Constant[value={0}]()
  %247702 : Long(device=cpu) = onnx::Constant[value={1}]()
  %247703 : Long(1, device=cpu) = onnx::Range(%247701, %247700, %247702)
  %247704 : Long(4, strides=[1], device=cpu) = onnx::Shape(%247669)
  %247705 : Long(device=cpu) = onnx::Constant[value={3}]()
  %247706 : Long(device=cpu) = onnx::Gather[axis=0](%247704, %247705)
  %247707 : Long(device=cpu) = onnx::Cast[to=7](%247706)
  %247708 : Long(device=cpu) = onnx::Constant[value={0}]()
  %247709 : Long(device=cpu) = onnx::Constant[value={1}]()
  %247710 : Long(644, device=cpu) = onnx::Range(%247708, %247707, %247709)
  %247712 : Long(1, strides=[1], device=cpu) = onnx::Unsqueeze[axes=[0]](%247670)
  %247713 : Long(1, strides=[1], device=cpu) = onnx::Unsqueeze[axes=[0]](%247671)
  %247715 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={1}]()
  %247716 : Long(184, device=cpu) = onnx::Slice(%247710, %247712, %247713, %263646, %247715)
  %247717 : Long(4, strides=[1], device=cpu) = onnx::Constant[value=-1  1  1  1 [ CPULongType{4} ]]()
  %247718 : Long(1, 1, 1, 1, strides=[1, 1, 1, 1], device=cpu) = onnx::Reshape(%247689, %247717)
  %247719 : Long(3, strides=[1], device=cpu) = onnx::Constant[value=-1  1  1 [ CPULongType{3} ]]()
  %247720 : Long(2, 1, 1, strides=[1, 1, 1], device=cpu) = onnx::Reshape(%247696, %247719)
  %247721 : Long(2, strides=[1], device=cpu) = onnx::Constant[value=-1  1 [ CPULongType{2} ]]()
  %247722 : Long(1, 1, strides=[1, 1], device=cpu) = onnx::Reshape(%247703, %247721)
  %247723 : Long(1, 2, 1, 1, strides=[2, 1, 1, 1], device=cpu) = onnx::Add(%247718, %247720)
  %247724 : Long(1, 2, 1, 1, strides=[2, 1, 1, 1], device=cpu) = onnx::Add(%247723, %247722)

It would be nice if onnx.export had a progress bar/tqdm feature built in, though, so that it’s not silent for 8 hours.

ptrblck · July 30, 2021, 4:13am

Creating a 39MB file in 8h seems terrible and I still think that this cannot be expected.
Good to hear the script finally finished, but I guess the export might have been slower than the model training?
In case you can reproduce it, could you create an issue on GitHub, please?

sevagh · August 22, 2021, 3:32pm

I’m a bit late here but I filed it on GitHub finally: Torch.onnx.export is very slow, takes 8 hours · Issue #63734 · pytorch/pytorch · GitHub
It should be reproducible.