Transformer model to ONNX, batch inference

I implemented the CLTR model (crowd and localisation transformer), which uses Conditional DETR. I was able to create an onnx model with input size [12, 3, 256, 256] - similar to my image. However, I would like to process this in batches, and I was no able to do so, as an input like this (batch_size, 12, 3, 256, 256) is not supported.
Any suggestions how could I solve this problem?

1 Like