Onnx export for operator Tensor.repeat

Hi dear all, I got problems when exporting my model which includes a x.repeat() operator to onnx.
To repreduce, a simple model similar to mine is as follows (the numbers of dimensions are ad-hoc for the convenience):

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

        self.conv1 = nn.Conv2d(3, 16, 3, stride=2)
        self.norm1 = nn.LayerNorm([16, 3, 3])
        self.relu1 = nn.PReLU()

        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(144, 32)
        self.softmax1 = nn.Softmax(dim=-1)
        self.softmax2 = nn.Softmax(dim=-1)
    
    def forward(self, x, graph):
        bat_sz = x.shape[0]
        agent_n = x.shape[1]

        x = x.view(bat_sz*agent_n, 3, 8, 8)
        x = self.relu1(self.norm1(self.conv1(x)))
        # print('x shape: {}'.format(x.shape))

        code = x.view(bat_sz, agent_n, -1)

        code = code.view(bat_sz, 1, agent_n, 144)
        code = code.repeat(1, agent_n, 1, 1)

        # print(code.shape)

        x = code.view(25, 144)
        x = self.dropout(x)
        x = self.fc(x)
        x1 = self.softmax1(x)
        x = torch.cat((x, x1), dim=1)
        x2 = self.softmax2(x)
        return x1, x2

The export codes are as follows:

model = Model()
    model.cuda()

    # Export to ONNX
    model.eval()
    x = torch.randn(1, 5, 3, 8, 8, requires_grad=True, device='cuda')
    graph = torch.randint(2, [1, 5, 5], device='cuda').to(torch.long)
    o1, o2 = model(x, graph)
    
    #Export model1.onnx with batch_size=1 
    print('\nExporting model.onnx ...')
    torch.onnx.export(model,
                      (x, graph),
                      'model.onnx',
                      opset_version=9,
                      verbose=True,
                      export_params=True,
                      input_names=['x', 'graph'],
                      output_names=['out1', 'out2'],
                      #dynamic_axes={'input': {0: 'batch_size'},  # variable lenght axes
                                    #'output': {0: 'batch_size'}}
                      )

The export progress was fine. But when loading model in C++, I got:

........
VERBOSE: ModelImporter.cpp:107: Parsing node: Softmax_42 [Softmax]
VERBOSE: ModelImporter.cpp:123: Searching for input: 72
VERBOSE: ModelImporter.cpp:129: Softmax_42 [Softmax] inputs: [72 -> (25, 32)], 
VERBOSE: ImporterContext.hpp:122: Registering layer: Softmax_42 for ONNX node: Softmax_42
VERBOSE: ImporterContext.hpp:97: Registering tensor: out1_1 for ONNX tensor: out1
VERBOSE: ModelImporter.cpp:180: Softmax_42 [Softmax] outputs: [out1 -> (25, 32)], 
VERBOSE: ModelImporter.cpp:107: Parsing node: Concat_43 [Concat]
VERBOSE: ModelImporter.cpp:123: Searching for input: 72
VERBOSE: ModelImporter.cpp:123: Searching for input: out1
VERBOSE: ModelImporter.cpp:129: Concat_43 [Concat] inputs: [72 -> (25, 32)], [out1 -> (25, 32)], 
VERBOSE: ImporterContext.hpp:122: Registering layer: Concat_43 for ONNX node: Concat_43
VERBOSE: ImporterContext.hpp:97: Registering tensor: 74 for ONNX tensor: 74
VERBOSE: ModelImporter.cpp:180: Concat_43 [Concat] outputs: [74 -> (25, 64)], 
VERBOSE: ModelImporter.cpp:107: Parsing node: Softmax_44 [Softmax]
VERBOSE: ModelImporter.cpp:123: Searching for input: 74
VERBOSE: ModelImporter.cpp:129: Softmax_44 [Softmax] inputs: [74 -> (25, 64)], 
VERBOSE: ImporterContext.hpp:122: Registering layer: Softmax_44 for ONNX node: Softmax_44
VERBOSE: ImporterContext.hpp:97: Registering tensor: out2_1 for ONNX tensor: out2
VERBOSE: ModelImporter.cpp:180: Softmax_44 [Softmax] outputs: [out2 -> (25, 64)], 
VERBOSE: ModelImporter.cpp:494: Marking out1_1 as output: out1
VERBOSE: ModelImporter.cpp:494: Marking out2_1 as output: out2
WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
INTERNAL_ERROR: Assertion failed: res[sizeDim] <= inputSize && "Output size must be less than or equal to input size."
../builder/cudnnBuilderGraphShapeAnalyzer.cpp:671
Aborting...

ERROR: ../builder/cudnnBuilderGraphShapeAnalyzer.cpp (671) - Assertion Error in symbolicSlice: 0 (res[sizeDim] <= inputSize && "Output size must be less than or equal to input size.")

If I excluded the repeat line in the forward(), everything was fine. The model can be loaded in C++.

Any help?

If I cannot get away with broadcasting, I usually avoid repeat by using view followed by expand. These seem (or at used seem) to be better in terms of performance and better-supported.

Best regards

Thomas