I had K40 GPU in my computer. Last week, I added 1080 to the same computer.
In my first experiment, I observed identical results in both GPU. Then, I tried my second code on both GPUs. In this case, I “constantly” got good results in K40 while getting “constantly” awful results in 1080 for “exactly the same code”.
First, I thought the only reason for getting such diverse outputs would be the random seeds in the codes. So, I fixed the seeds like this:
torch.manual_seed(3)
torch.cuda.manual_seed_all(3)
numpy.random.seed(3)
But, this did not solve the issue. I believe issue cannot be randomness because I was “constantly” getting good results in K40 and “constantly” getting bad results in 1080. Moreover, I tried exactly the same code in 2 other computers and 4 other 1080 GPUs and always achieved good results. So, problem has to be about the 1080 I recently plugged in.
I suspect problem might be about driver, or the way I installed pytorch. But, it is still weird that I only get bad results for “some” of the experiments. For the other experiments, I had the identical results.
Unfortunately, I cannot share my code. But, the code that works in both GPUs uses ResNet while the one it does not work for my new GPU uses a large convolutional network. Again, problem has to be about my new GPU as the same code works in everywhere else.
Can anyone help me on this?