Hi, i just add a pass in TorchScript IR to convert BertLayer to fastertransformer Encoder, however i find model is slow after convert to TorchScript. I get Nvprof result and find a time consuming activity:

Type Time(%) Time Calls Avg Min Max Name
GPU activities: 57.50% 1.49484s 25200 59.319us 3.2000us 151.55us _ZN2at6native27unrolled_elementwise_kernelIZZZNS0_21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE0_clEvENKUlvE2_clEvEUlfE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESC_NS0_6memory15LoadWithoutCastENSD_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_

I watched my final TorchScript IR, and i guess it’s reason is each time it runs it will do aten::contiguous several times, like:

aten::contiguous is needed for Tensors which will be send to custom op because they will be convert by .transpose(-1, -2) first, but aten::contiguous seems time consuming. So is there any way that i can convert model weights to constant in TorchScript IR so that aten::contiguous(weights) will be convert to Constant Tensor, or if i can do something to avoid aten::contiguous? Thankyou very much!

.contiguous() is copying the data, if the data isn’t stored in a contiguous memory array e.g. after a transpose.
If your kernel needs to work on a contiguous array and you need to permute the tensor (i.e. you cannot pass it in the expected shape from the beginning), I don’t think there is a workaround.

Thankyou for response, so can i frozen weights to Constant in TorchScript IR? I mean, i just want to add Tensor t = model.weight.data.transpose(-1, -2).contiguous() in TorchScript IR, if i convert model.weight to Constant, then Tensor t will be optimize to Constant by pass.
My current IR is:

Are you calling the transpose operation in the __init__ method of your model, the forward or somewhere else?
Could you transpose the parameter before and pass it to the model directly?
Also, don’t use the .data attribute as it might yield unwanted side effects.

Yes, you should be able to get the tensor by its name. However, based on the IR it seems your graph contains the transpose and contiguous op, as it’s apparently needed in a custom layer.
I’m unsure, how you would like to avoid these operations.

I want to get the actual Tensor of %weight, and then create a new Constant Node by this Tensor and insert it to Graph after transpose it, so how to get tensor by name? Thankyou very much!