Very weird behaviour when running the same code in two different GPUs

I had K40 GPU in my computer. Last week, I added 1080 to the same computer.

In my first experiment, I observed identical results in both GPU. Then, I tried my second code on both GPUs. In this case, I “constantly” got good results in K40 while getting “constantly” awful results in 1080 for “exactly the same code”.

First, I thought the only reason for getting such diverse outputs would be the random seeds in the codes. So, I fixed the seeds like this:

torch.manual_seed(3)
torch.cuda.manual_seed_all(3)
numpy.random.seed(3)

But, this did not solve the issue. I believe issue cannot be randomness because I was “constantly” getting good results in K40 and “constantly” getting bad results in 1080. Moreover, I tried exactly the same code in 2 other computers and 4 other 1080 GPUs and always achieved good results. So, problem has to be about the 1080 I recently plugged in.

I suspect problem might be about driver, or the way I installed pytorch. But, it is still weird that I only get bad results for “some” of the experiments. For the other experiments, I had the identical results.

Unfortunately, I cannot share my code. But, the code that works in both GPUs uses ResNet while the one it does not work for my new GPU uses a large convolutional network. Again, problem has to be about my new GPU as the same code works in everywhere else.

Can anyone help me on this?

Problem seems to be solved after re-installing pytorch!

1 Like