Hi, i just add a pass in TorchScript IR to convert BertLayer to fastertransformer Encoder, however i find model is slow after convert to TorchScript. I get Nvprof result and find a time consuming activity:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 57.50% 1.49484s 25200 59.319us 3.2000us 151.55us _ZN2at6native27unrolled_elementwise_kernelIZZZNS0_21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE0_clEvENKUlvE2_clEvEUlfE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESC_NS0_6memory15LoadWithoutCastENSD_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
I watched my final TorchScript IR, and i guess it’s reason is each time it runs it will do aten::contiguous several times, like:
%1752 : Float(*, *, requires_grad=1, device=cuda:0) = aten::contiguous(%1153, %21)
aten::contiguous is needed for Tensors which will be send to custom op because they will be convert by .transpose(-1, -2) first, but aten::contiguous seems time consuming. So is there any way that i can convert model weights to constant in TorchScript IR so that aten::contiguous(weights) will be convert to Constant Tensor, or if i can do something to avoid aten::contiguous? Thankyou very much!
.contiguous() is copying the data, if the data isn’t stored in a contiguous memory array e.g. after a transpose.
If your kernel needs to work on a contiguous array and you need to permute the tensor (i.e. you cannot pass it in the expected shape from the beginning), I don’t think there is a workaround.
Thankyou for response, so can i frozen weights to Constant in TorchScript IR? I mean, i just want to add Tensor t = model.weight.data.transpose(-1, -2).contiguous() in TorchScript IR, if i convert model.weight to Constant, then Tensor t will be optimize to Constant by pass.
My current IR is:
%1128 : Float(*, *, requires_grad=1, device=cuda:0) = prim::GetAttr[name="weight"](%1126)
%1436 : Float(*, *, requires_grad=1, device=cuda:0) = aten::transpose(%1128, %10, %1147)
%1610 : Float(*, *, requires_grad=1, device=cuda:0) = aten::contiguous(%1436, %21)
And i want to covert it to:
%1060 : Tensor = prim::Constant[value=<Tensor>]()
Are you calling the
transpose operation in the
__init__ method of your model, the
forward or somewhere else?
Could you transpose the parameter before and pass it to the model directly?
Also, don’t use the
.data attribute as it might yield unwanted side effects.
Sorry, i need to write a pass for TorchScript IR so i can’t control the model, so i want to convert model’s weights to constant just in IR pass.
Or can i get actual Tensor of model weigth in TorchScript pass? If i have:
%weight : Tensor = prim::GetAttr[name="weight"](%1)
can i get the actual Tensor by Value %weight?
Yes, you should be able to get the tensor by its name. However, based on the IR it seems your graph contains the
contiguous op, as it’s apparently needed in a custom layer.
I’m unsure, how you would like to avoid these operations.
I want to get the actual Tensor of %weight, and then create a new Constant Node by this Tensor and insert it to Graph after transpose it, so how to get tensor by name? Thankyou very much!