I have an autoencoder which I am training using DDP. I wanted to try and improve the performance by using MKLDNN, i tried to convert the model to MKL DNN using the following lines but at runtime, i get an Assertion error. Is MKL DNN not supported for DDP? or am i doing something wrong? any help would be highly appreciated.
Error ###
File β/N/u2/p/pulasthiiu/git/deepLearning_MDS/nnprojects/Mnist/AutoEncodertDDPDataGenMKL.pyβ, line 156, in
main()
File β/N/u2/p/pulasthiiu/git/deepLearning_MDS/nnprojects/Mnist/AutoEncodertDDPDataGenMKL.pyβ, line 105, in main
ddp_model = DDP(autoencoderMKL)
File β/N/u2/p/pulasthiiu/python3.8/lib/python3.8/site-packages/torch/nn/parallel/distributed.pyβ, line 344, in init
assert any((p.requires_grad for p in module.parameters())), ( AssertionError: DistributedDataParallel is not needed when a module doesnβt have any parameter that requires a gradient.
Traceback (most recent call last):
File β/N/u2/p/pulasthiiu/python3.8/lib/python3.8/runpy.pyβ, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File β/N/u2/p/pulasthiiu/python3.8/lib/python3.8/runpy.pyβ, line 87, in _run_code
exec(code, run_globals)
File β/N/u2/p/pulasthiiu/python3.8/lib/python3.8/site-packages/torch/distributed/launch.pyβ, line 260, in
main()
File β/N/u2/p/pulasthiiu/python3.8/lib/python3.8/site-packages/torch/distributed/launch.pyβ, line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command β[β/N/u2/p/pulasthiiu/python3.8/bin/python3β, β-uβ, β/N/u2/p/pulasthiiu/git/deepLearning_MDS/nnprojects/Mnist/AutoEncodertDDPDataGenMKL.pyβ, β-wβ, β40β, β-epβ, β10β, β-bsβ, β8000β, β-rcβ, β1024β, β-dsβ, β640000β, β-lβ, β768x576x432x324β]β returned non-zero exit status 1.
I donβt believe that to_mkldnn() modifies the underlying model, just the memory format of the tensors, please let me know if I am wrong. We need more info on the AutoEncoder model and what that looks like. Could you also include the code to the model? Does that have any parameters?
Sorry about the late reply. It is a simple autoencoder, just have some logic to add layers when I specify the number of layers in the autoencoder (the code is below). Am I using the to_mkldnn function incorrectly?
Thanks for the model. I just verified that it is failing and mkldnn does change the model layers. I donβt have a lot of context on mkl dnn, but I created an issue on github to track this and loop in the right people, Support for mkldnn + ddp Β· Issue #56024 Β· pytorch/pytorch Β· GitHub.