Is there any practical example for training model using jit?

I’m trying to use jit to train my model. But I got failed when simply training resnet18 in the data parallel way.

I only added the two following lines:

https://github.com/pytorch/examples/compare/master...daquexian:demo_jit

And here is the error message

=> creating model ‘resnet18’
Traceback (most recent call last):
File “default_train_imagenet.py”, line 399, in
main()
File “default_train_imagenet.py”, line 113, in main
main_worker(args.gpu, ngpus_per_node, args)
File “default_train_imagenet.py”, line 236, in main_worker
train(train_loader, model, criterion, optimizer, epoch, args)
File “default_train_imagenet.py”, line 287, in train
loss.backward()
File “/usr/local/lib/python3.6/dist-packages/torch/tensor.py”, line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py”, line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

JIT in pytorch is promising and powerful. It saves more than half of time if I run only the forward pass repeatedly. But it seems there are not practical examples for training model using jit.

I will be very appreciated if someone helps me solve this error, or posts a correct, runnable and practical training example :slight_smile:

By the way, my system info is

Python 3.6.7 (default, Oct 25 2018, 09:16:13) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.0.0'
>>> 

Data parallel is such a fundamental module. Frankly speaking, I can’t imagine JIT is not compatible with it and no official programmer have tested it. It makes the JIT totally unpractical for training (except toy models)