I’m trying to run this tutorial locally for one parameter server and two workers.
The problem is I’m getting the below error:
Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/process.py”, line 258, in _bootstrap
File “/usr/lib/python3.6/multiprocessing/process.py”, line 93, in run
File “rpc_parameter_server.py”, line 228, in run_worker
run_training_loop(rank, num_gpus, train_loader, test_loader)
File “rpc_parameter_server.py”, line 187, in run_training_loop
RuntimeError: Error on Node 0: one of the variables needed for gradient computation has been modified by an inplace operation: [CPUFloatType [32, 1, 3, 3]] is at version 5; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Here’s my torch version if needed:
pip3 freeze | grep torch
Thanks in advance for any advice!