I exported pytorch model (two stage faster rcnn) to onnx, and ran inference with onnxruntime.
The model was exported well and the shapes of outputs are as expected, but the results are very different for CPU and CUDA EP. CPU EP outputs same result as eager mode execution, but CUDA EP outputs wildly different result. To debug, i dumped every node’s output with this:
for node in onnx_model.graph.node:
for output in node.output:
onnx_model.graph.output.extend(
[onnx.ValueInfoProto(name=output)]
)
and compared their outputs layer by layer. It appears the outputs start to diverge a lot at torch.expand, which is confusing as it’s seemingly very simple operation, and so i dont see why CPU and CUDA output different results. Can anyone please help me with this issue?
def postprocess(
self,
proposals: Tensor,
objectness: Tensor,
image_shapes: Tensor,
features: List[Tensor],
):
@torch.jit.script
def _postprocess(
proposals: Tensor,
objectness: Tensor,
image_shapes: Tensor,
features: List[Tensor],
min_size: float,
score_thresh: float,
nms_thresh: float,
pre_nms_top_n: int,
post_nms_top_n: int,
proposal_dim: int,
num_anchors_per_location: int,
):
....
# top_n_idx: Tensor of shape [batchsize, pre_nms_top_n]
top_n_idx = objectness.topk(pre_nms_top_n, dim=1)[1] # this output is same for CPU, CUDA
# torch.gather is equivalent to the following
# image_range = torch.arange(num_images)
# batch_idx = image_range.unsqueeze(1)
# objectness = objectness[batch_idx, top_n_idx]
# levels = levels[batch_idx, top_n_idx]
# proposals = proposals[batch_idx, top_n_idx]
# objectness: Tensor of shape [batchsize, num_anchors]
objectness = torch.gather(objectness, 1, top_n_idx) # this output is same for CPU, CUDA
# levels: Tensor of shape [batchsize, num_anchors]
levels = torch.gather(levels, 1, top_n_idx) # this output is same for CPU, CUDA
# proposals: Tensor of shape [batchsize, num_anchors, 4]
# to use gather, unsqueeze and expand last index
top_n_idx = top_n_idx.unsqueeze(2).expand(-1, -1, proposal_dim) # this is very different CPU vs CUDA, at expand.
proposals = torch.gather(proposals, 1, top_n_idx) # this is also very different
....
return proposals, objectness
return _postprocess(
proposals,
objectness,
image_shapes,
features,
self.min_size,
self.score_thresh,
self.nms_thresh,
self.pre_nms_top_n(),
self.post_nms_top_n(),
self.box_coder.proposal_dim,
self.anchor_generator.num_anchors_per_location()[0],
)
Thank you for your time and help in advance