How can I tell where underlying operations/optimsations are coming from?

mvhv · March 1, 2022, 4:35am

I’m exporting a model to ONNX for use in OpenCV, and have had to avoid a few ops that aren’t supported.

A previous version of my model exports to ONNX successfully, but after making some changes, I am now getting the following error:

RuntimeError: Exporting the operator resolve_conj to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

I’m not using the resolve_conj() op explicitly in my code, so I assume it’s inserted during JIT, or it’s a sub operation inside another op. My understanding is that this op is used to resolve a conjugate view to a tangible tensor, so my guess it’s related to one of the view() or permute() ops I am using.

How can I figure out where its been inserted in the graph, and how can I link that back to my actual code so I can make the appropriate changes to allow ONNX export?

ptrblck · March 1, 2022, 6:47am

If the error is raised in the PyTorch backend (not ONNXRuntime or so), you might be able to use:

export TORCH_SHOW_CPP_STACKTRACES=1

to get a better stacktrace.

mvhv · March 1, 2022, 8:35am

Thanks for the reply, but the error doesn’t appear to be in the C++ backend. I should have been more explicit but I tried to leave out the extraneous details.

The error is being raised in Python when the op can’t be found, after I call torch.onnx.export(). Here’s the abridged trace:

torch.onnx.export(opencv_encoder, image, opencv_model_output, opset_version=11, verbose=True, input_names=["input"], output_names=["patch_scores"])#, "image_score"])
opencv_model = onnx.load(opencv_model_output)
onnx.checker.check_model(opencv_model)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\__init__.py:316, in export(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\utils.py:107, in export(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\utils.py:724, in _export(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\utils.py:497, in _model_to_graph(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\utils.py:216, in _optimize_graph(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\__init__.py:373, in _run_symbolic_function(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\utils.py:1028, in _run_symbolic_function(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\utils.py:982, in _find_symbolic_in_registry(...)

File ~\.virtualenvs\pytorch-jk_rFARN\lib\site-packages\torch\onnx\symbolic_registry.py:125, in get_registered_op(...)

It looks like it’s failing because there’s a resolve_conj() call somewhere and exporting that op to ONNX hasn’t been implemented. Is that the right understanding here, or do you think that I’ve hit a more significant bug somewhere?

I certainly expect that some ops won’t work with ONNX, but I’m a bit stuck on how to go about finding where the op is coming from, aside from simply adding and removing ops until the export succeeds.

For posterity here’s the environment I’m currently testing in:

Windows 10 20H2
Python 3.9.6
PyTorch 1.10.2+cpu installed via Pipenv 2021.5.29

ptrblck · March 1, 2022, 8:45am

No, I think your assumption in correct and I would expect to see the failing on in the stacktrace.
I.e. are function calls shown in the _run_symbolic_function or any other failed call?
Each call should point to a line of code and I would hope the failing operation should be showed there too (with its call history).

mvhv · March 1, 2022, 9:33am

The traces I was getting don’t have parameters in them, and in retrospect I probably should have just used the debugger to step through the symbolic function parsing, but I was thinking there might be an obvious way to deal with this.

I’ve figured out where the problem was though. Because I was debugging a workaround, I had thoroughly peppered my code with print() statements, and it appears like printing a tensor slice invokes resolve_conj() on the tensor.

Here’s a minimal reproducible example:

import torch
from torch import nn

class BadFirst(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        x_slice = x[:, 0]
        print(f"x_slice: {x_slice}")
        return x_slice

if __name__ == "__main__":
    m = BadFirst().eval()
    x = torch.rand(10, 5)
    
    res = m(x) # this works
    torch.onnx.export(m, x, "badfirst.onnx") # this doesn't

I assumed the JIT trace would end up trimming those branches from the graph with a backward pass before passing the result to the ONNX optimiser, but it seems like that isn’t the case.

Anyway, thanks for the sanity check. I’ll check the repo to see if anyone has mentioned this previously and I’ll open an issue if not.

jinfagang · September 23, 2022, 3:07pm

Hi, I got same error, but unfortunately, I can not fixed by find somewhere called print. I didn’t have this operation.

Somehow it’s caused by a slice , but this shouldn’t happen?

lib/python3.9/site-packages/torch/onnx/utils.py", line 1805, in _run_symbolic_function
    raise errors.UnsupportedOperatorError(
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::resolve_conj' to ONNX opset version 14 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues