Converting to Onnx Raises CUDA out of memory error

I currently have a basic jupyter notebook that follows this article. My hope is to later convert the onnx file into a tflite file in the future.

However, after running the code linked in the article, I receive a CUDA out of memory error by running the code below. I also ran this on a Google Colab and produced the same error, so I can assume that hardware is not the issue here.

torch.onnx.export(model, input_batch, '../model/deeplab_model_pytorch.onnx', verbose=True)

Does anybody know why this is causing an error?

Are you able to run the forward pass using the current input_batch?
If I’m not mistaken, the onnx.export method would trace the model, so needs to pass the input to it and execute a forward pass to trace all operations.
If it’s working before calling the export operation, could you try to export this model in a new script with an empty GPU, as your script might already be using some memory.

I was able to run the current input_batch before calling the export operation. I ended up resolving this issue by making a dummy input a smaller, random tensor (calling torch.randn(1,3,224,224))and passing that into the export function in place of input_batch in my first post here.

However, I’m running into a different issue regarding the following 2 cells.

Cell #1:

# https://discuss.pytorch.org/t/when-i-export-deeplab-v3-using-torch-onnx-export-i-find-that-the-onnx-model-has-two-outputs-why/92401
dummy_input = torch.randn(1,3,224,224).to('cuda')
dummy_output = model(dummy_input)
input_names = [f"input {i}" for i in range(21)]

torch.onnx.export(model, dummy_input, '../models/deeplab_model_pytorch.onnx', verbose=True, input_names=['input_img'], output_names=['output_img'], opset_version=11)

Cell #2

import onnx
from onnx_tf.backend import prepare

model_onnx = onnx.load('../models/deeplab_model_pytorch.onnx')

tf_rep = prepare(model_onnx)
tf_rep.export_graph('../models/deeplab_model_tf.pb')

When tf_rep.export_graph runs in the 2nd cell, I get an error that reads

RuntimeError: Resize coordinate_transformation_mode=pytorch_half_pixel is not supported in Tensorflow.

I have read that changing my opset version to 10 in Cell #1 would remove the runtime error, but I’m unfortunately running into a different error where model is exported incorrectly between onnx and tensorflow. Would you have a suggestion on how to fix this?

No, I’m unfortunately completely unfamiliar with onnx_tf and its support for PyTorch.
I’m also unsure what pytorch_half_pixel refers to, i.e. it doesn’t sound like the data type (float16), so I assume it’s related to the transformation.