Pytorch Training Network with CuDNN Inference Network

Hi,

I am training LeNet for handwritten digit classification on google colab using Pytorch with MNIST data set and saving weights and biases of the network in text files for each layer. Then loading these network parameters to the locally created LeNet (for inference) built using CuDNN library.
INPUT->CONV1->POOL1->CONV2->POOL2->FC1->RELU->FC2->SMAX

Pytorch seems to return weights in NCHW format, hence using NCHW format for all the tensors in CuDNN library. The flattened weights and biases are read into buffers from the text files.

For the same test dataset, on cloud I am getting a classification error of ~5%, whereas using the same weights and bias on a locally created network, I am getting an error of ~10%. Can anyone please suggest the reason behind this difference in the error? Is there any detail between Pytorch and CuDNN that I am missing?

Here is a little Pytorch snippet for creating the network-

class LeNet(nn.Module):
# network structure
def init(self):
super(LeNet, self).init()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 50, 5)
self.fc1 = nn.Linear(50x4x4, 500)
self.fc2 = nn.Linear(500, 10)

def forward(self, x):
    x = F.max_pool2d(self.conv1(x), (2, 2))
    x = F.max_pool2d(self.conv2(x), (2, 2))
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.softmax(self.fc2(x))
    return x

def num_flat_features(self, x):
    size = x.size()[1:]
    return np.prod(size)

Here is a little Pytorch snippet for saving weights and biases-

for name, param in net_gpu.named_parameters():
v1 = name.partition(’.’)[0]
v2 = name.partition(’.’)[2]
arr = np.array(param.data.cpu())
f = open(v1 + “_” + v2 + “.txt”, “w”)
for val in arr:
np.savetxt(f, val.flat)
f.close()

Here is a CuDNN sippet for creating inference network-

// Conv1 layer
cudnnConvolutionForward(cudnnHandle, &alpha, dataTensor,
data, conv1filterDesc, pconv1, conv1Desc,
conv1algo, workspace, m_workspaceSize, &beta,
conv1Tensor, conv1);
cudnnAddTensor(cudnnHandle, &alpha, conv1BiasTensor,
pconv1bias, &alpha, conv1Tensor, conv1);

  // Pool1 layer
  cudnnPoolingForward(cudnnHandle, poolDesc, &alpha, conv1Tensor,
  	conv1, &beta, pool1Tensor, pool1);

  // Conv2 layer
  cudnnConvolutionForward(cudnnHandle, &alpha, pool1Tensor,
  	pool1, conv2filterDesc, pconv2, conv2Desc,
  	conv2algo, workspace, m_workspaceSize, &beta,
  	conv2Tensor, conv2);
  cudnnAddTensor(cudnnHandle, &alpha, conv2BiasTensor,
  	pconv2bias, &alpha, conv2Tensor, conv2);

  // Pool2 layer
  cudnnPoolingForward(cudnnHandle, poolDesc, &alpha, conv2Tensor,
  	conv2, &beta, pool2Tensor, pool2);

  // FC1 layer
  // Forward propagate neurons using weights (fc1 = pfc1'*pool2)
  cublasSgemm(cublasHandle, CUBLAS_OP_T, CUBLAS_OP_N,
  	ref_fc1.outputs, m_batchSize, ref_fc1.inputs,
  	&alpha,
  	pfc1, ref_fc1.inputs,
  	pool2, ref_fc1.inputs,
  	&beta,
  	fc1, ref_fc1.outputs);
  // Add bias using GEMM's "beta" (fc1 += pfc1bias*1_vec')
  cublasSgemm(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N,
  	ref_fc1.outputs, m_batchSize, 1,
  	&alpha,
  	pfc1bias, ref_fc1.outputs,
  	onevec, 1,
  	&alpha,
  	fc1, ref_fc1.outputs);

  // ReLU activation
  cudnnActivationForward(cudnnHandle, fc1Activation, &alpha,
  	fc1Tensor, fc1, &beta, fc1Tensor, fc1relu);

  // FC2 layer
  // Forward propagate neurons using weights (fc2 = pfc2'*fc1relu)
  cublasSgemm(cublasHandle, CUBLAS_OP_T, CUBLAS_OP_N,
  	ref_fc2.outputs, m_batchSize, ref_fc2.inputs,
  	&alpha,
  	pfc2, ref_fc2.inputs,
  	fc1relu, ref_fc2.inputs,
  	&beta,
  	fc2, ref_fc2.outputs);
  // Add bias using GEMM's "beta" (fc2 += pfc2bias*1_vec')
  cublasSgemm(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N,
  	ref_fc2.outputs, m_batchSize, 1,
  	&alpha,
  	pfc2bias, ref_fc2.outputs,
  	onevec, 1,
  	&alpha,
  	fc2, ref_fc2.outputs);

  // Softmax loss
  cudnnSoftmaxForward(cudnnHandle, CUDNN_SOFTMAX_ACCURATE, CUDNN_SOFTMAX_MODE_CHANNEL,
  	&alpha, fc2Tensor, fc2, &beta, fc2Tensor, result);

Any suggestion on this will be highly appreciated. Thanks!

You could use the cudnn API logging on both workloads and compare the logs to isolate potential differences between both runs.

Thanks @ptrblck

I can get logs for CuDNN on the local machine by using environment variables. To get the log on google colab for Pytorch code, I was using-

import os
os.environ[‘CUDNN_LOGINFO_DBG’] = ‘1’
os.environ[‘CUDNN_LOGDEST_DBG’] = ‘log.txt’

But, it doesn’t seem to return any log in .txt file. I also tried to print log on console by using stdout, but it doesn’t work as well. Please suggest how to get log on google colab in Pytorch?

Unfortunately, I don’t know how cudnn logging can be enabled in Jupyter notebooks or Google Colab and also don’t know if it’s a limitation of the former of latter.
One common issue users are seeing when trying to set env variables inside a notebook is the order:
e.g. setting CUDA_VISIBLE_DEVICES or CUDA_LAUNCH_BLOCKING after creating the CUDA context wouldn’t have any effect. Maybe you are facing a similar issue and could move the os.environ calls to the very beginning of the notebook.

Moving os.environ calls to the very beginning helped save the log to .txt file in google colab. That was helpful!

Thanks @ptrblck