I run my pytorch code well on mac and even on windows system but the same code seems stuck on CentOS6.3.
I debug with ipdb, and found the code was stuck at F.conv2d function:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19285 work 20 0 2106m 906m 22m S 0.3 0.6 0:31.14 python
The running env was created with anaconda(python 2.7/3.6), pytorch version is 0.4.0.
I tried for a long time to resolve this problem and i tried. Do you have a suggestion? Thank you so much!
Are you running your code on CPU or GPU, and multiprocessing?
for ii, (data, label) in tqdm(enumerate(train_dataloader)):
input = Variable(data)
target = Variable(label)
optimizer.zero_grad()
score = model(input) # stuck here
loss = criterion(score, target)
loss.backward()
optimizer.step()
On CPU, no multiprocessing i think…
I reinstall CentOS6.3, and then upgrade glibc2.14, glibc2.17 due to the pytorch0.4.0 running error info.
Now everything is ok.
By the way, the pytorch0.3.1 perform well before i upgrade the glibc(up to 2.12). So i think the lastest pytorch0.4.0 may haven’t deal very well with glibc, leave running deadlock appearance and doesn’t tell any error and warning info, just stuck at F.conv2d in torch/nn/modules/conv.py(301).
Thank you all the same 