Non-deterministic on GPU

Peng · September 13, 2017, 1:58pm

Hi, I got non-deterministic results from my code when I run several times on GPU.
I have ever print some loss and there are some minor difference at the beginning and it become totally different at the end.
I am wondering is there any implementation error in my code? I use torchtext for processing the data. I set random seed for torch and torch.cuda, I run on one GPU. random seed also set for numpy and rondom module.
And I have set model.train() when training and model.eval() when testing.
I notice that some people said that it is because of the randomness of the GPU. But there are 1% to 2% difference on my results which is really a big gap between them.
The following is my model code which is simply a conv for text classification.

github.com

Impavidity/kim_cnn/blob/master/model.py

import torch
import torch.nn as nn

import torch.nn.functional as F

class KimCNN(nn.Module):
    def __init__(self, config):
        super(KimCNN, self).__init__()
        output_channel = config.output_channel
        target_class = config.target_class
        words_num = config.words_num
        words_dim = config.words_dim
        embed_num = config.embed_num
        embed_dim = config.embed_dim
        self.mode = config.mode
        Ks = 3 # There are three conv net here
        if config.mode == 'multichannel':
            input_channel = 2
        else:
            input_channel = 1

This file has been truncated. show original

and this the train.py
https://github.com/Impavidity/kim_cnn/blob/master/train.py

Can someone help out of here? Thanks.

mratsim · September 13, 2017, 6:13pm

CuDNN convolution code is non-deterministic, the deterministic version is slower.

See these bug reports on Torch and Tensorflow. I saw the same question in Nvidia forums too.