RTX3080 is even slower than my old 2070 super

I was running a simple network and found my new RTX 3080 is even slower than my old 2070 super. I checked the time used for each step. The backward propagation is really slow.

I am using pytorch 1.7.1 with CUDA 11.2.

The slowdown might have been introduced by non-trained cudnn heuristics for the 3080, e.g. in cudnn8.0.5. Based on your setup description I assume you’ve built PyTorch from source, since you are using CUDA11.2 while the 1.7.1 binaries ship with CUDA11.0? If that’s the case, which cudnn version have you used to build it?
Could you also post your model, so that we could check for regressions in internal cudnn builds?

1 Like

I use a simple network.

class Net(torch.nn.Module):
    def __init__(self, D_in=24, H=[128,128,128,128], D_out=1, drate = 0.0):
        
        super(Net, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H[0], bias= True)
        self.linear2 = torch.nn.Linear(H[0], H[1], bias= True)
        self.linear3 = torch.nn.Linear(H[1], H[2], bias= True)
        self.linear4 = torch.nn.Linear(H[2], H[3], bias= True)
        self.linear5 = torch.nn.Linear(H[3], D_out, bias = True)
        self.dlayer = torch.nn.Dropout(p=drate)
    
    def forward(self, x):
        x = self.dlayer(x)
        x = F.relu(self.linear1(x))
        x = self.dlayer(x)
        x = F.relu(self.linear2(x))
        x = self.dlayer(x)
        x = F.relu(self.linear3(x))
        x = self.dlayer(x)
        x = F.relu(self.linear4(x))
        x = self.dlayer(x)
        x = self.linear5(x)

        return x

The cudnn verison is 8.04 (torch.backends.cudnn.version() gives 8004).
I used Adam to train the network.