Hi, I am trying to create a TorchScript module of Facebook’s deep learning recommendation model (DLRM) using torch.jit.script() method. The conversion fails owing to the following runtime error:
RuntimeError:
cannot call a value of type 'Tensor':
File "dlrm_s_pytorch.py", line 275
# return x
# approach 2: use Sequential container to wrap all layers
return layers(x)
~~~~~~ <--- HERE
'DLRM_Net.apply_mlp' is being compiled since it was called from 'DLRM_Net.sequential_forward'
File "dlrm_s_pytorch.py", line 343
def sequential_forward(self, dense_x, lS_o, lS_i):
# process dense features (using bottom mlp), resulting in a row vector
x = self.apply_mlp(dense_x, self.bot_l)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
# debug prints
# print("intermediate")
'DLRM_Net.sequential_forward' is being compiled since it was called from 'DLRM_Net.forward'
File "dlrm_s_pytorch.py", line 337
def forward(self, dense_x, lS_o, lS_i):
if self.ndevices <= 1:
return self.sequential_forward(dense_x, lS_o, lS_i)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
else:
return self.parallel_forward(dense_x, lS_o, lS_i)
To recreate the error:
- Clone the DLRM repository and install the requirements.
<activate virtual environment>
git clone https://github.com/facebookresearch/dlrm.git
cd dlrm
pip install requirements.txt
- Add the following line in dlrm_s_pytorch.py at after line 179 to solve a type conversion issue:
n = n.item()
- Add the following snippet in dlrm_s_pytorch.py after the architecture object is initialized:
dlrm_jit = torch.jit.script(dlrm)
sys.exit() # successful exit after compiling, no need to train
- Run the below command:
python dlrm_s_pytorch.py --arch-sparse-feature-size=32 --arch-embedding-size="70446-298426-33086-133729-61823" --data-size=20480 --arch-mlp-bot="256-256-128-32" --arch-mlp-top="256-64-1" --max-ind-range=400000 --data-generation=random --loss-function=bce --nepochs=5 --round-targets=True --learning-rate=1.0 --mini-batch-size=2048