Using diffetent conv2d ops with pre trained models

Hello, I would like to know the proper practice when using different convolution operators while performing inference on pre-trained models from TorchVision, such as those found at https://pytorch.org/vision/main/models.html. These models include one or more convolutional layers, but during inference, all convolutions are executed using PyTorch’s default implementation (in the case of GPU usage, via cuBLAS).

To change this behavior and use my custom convolution operator or one from an external library like CUTLASS (https://github.com/NVIDIA/cutlass/blob/main/examples/python/02_pytorch_extension_grouped_gemm.ipynb), I would like to know if:

  1. It is better to modify PyTorch’s source code, specifically the call to torch.nn.Conv2d, to import and use CUTLASS or my custom convolution operator. For this, I assume it would be necessary to recompile PyTorch or TorchVision from source.
  2. It would be better to modify the model definitions in the TorchVision repository (https://github.com/pytorch/vision/tree/main/torchvision/models) to use my custom convolution operator directly in their implementation.
  3. The use of custom operators, as described here https://pytorch.org/tutorials/advanced/cpp_extension.html, would be a better approach.

I would like to know the best practice or if there is another, more feasible or simpler approach that minimizes changes to PyTorch’s source code or is more user-friendly. Thank you.

I think the least invasive option is (3): register custom operators, copy paste the TorchVision model, and replace the convolutions to use your own custom operator.

Hi, @soulitzer. Could you be more specific about how it could be done? Because of what you said, I understand that I should modify the de model internally, for instance, using the resnet definition vision/torchvision/models/resnet.py at main · pytorch/vision · GitHub and changing all torch.nn.con2d for my convolution. Am I right?

I guess what I was suggesting was more of making a literal copy of the model code from TorchVision.

The feasibility of this approach does depend on how self-contained the model you are targeting is.

@soulitzer, how can I make a literal copy of the model code? I mean, it is not the link I posted in the previous answer to the pytorch/vision github resnet.py file?

My approach is only changing the conv2d operation, nothing else. I do not intend to re-train the model, modify weights, inputs or anything else.

I only intend to perform a convolution with cutlass library (similarly to this example with python), but using an operator that the library allows emitting like this gemm.

An additional doubt is if the structure of the tensors of torch.nn.conv2d is compatible with the structure of cutlass:

# Input tensor: [N, H, W, C] under the channel-last layout
N, H, W, C = [32, 28, 28, 64]

# Weight tensor: [K, R, S, C] under the channel-last layout
K, R, S = [128, 3, 3]

# Stride, and padding
stride = (2, 2)
padding = (1, 1)
dilation = (1, 1)

# Compute the output size [N, P, Q, K]
N, P, Q, K = cutlass.Conv2d.output_size((N, H, W, C), (K, R, S, C), padding, stride, dilation)

Could you give some feedback about all of it?

If you mean literally copy-pasting from the resnet.py link you provided, then that is what I mean yes.

I’m not sure on the specifics, but while testing you can have your nn.Conv2D replacement run each sample twice with your version and the original version to check that the results are the same.