I’m trying to use jit to train my model. But I got failed when simply training resnet18 in the data parallel way.
I only added the two following lines:
https://github.com/pytorch/examples/compare/master...daquexian:demo_jit
And here is the error message
=> creating model ‘resnet18’
Traceback (most recent call last):
File “default_train_imagenet.py”, line 399, in
main()
File “default_train_imagenet.py”, line 113, in main
main_worker(args.gpu, ngpus_per_node, args)
File “default_train_imagenet.py”, line 236, in main_worker
train(train_loader, model, criterion, optimizer, epoch, args)
File “default_train_imagenet.py”, line 287, in train
loss.backward()
File “/usr/local/lib/python3.6/dist-packages/torch/tensor.py”, line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py”, line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
JIT in pytorch is promising and powerful. It saves more than half of time if I run only the forward pass repeatedly. But it seems there are not practical examples for training model using jit.
I will be very appreciated if someone helps me solve this error, or posts a correct, runnable and practical training example
By the way, my system info is
Python 3.6.7 (default, Oct 25 2018, 09:16:13)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.0.0'
>>>