Hi,

I am training LeNet for handwritten digit classification on google colab using Pytorch with MNIST data set and saving weights and biases of the network in text files for each layer. Then loading these network parameters to the locally created LeNet (for inference) built using CuDNN library.

INPUT->CONV1->POOL1->CONV2->POOL2->FC1->RELU->FC2->SMAX

Pytorch seems to return weights in NCHW format, hence using NCHW format for all the tensors in CuDNN library. The flattened weights and biases are read into buffers from the text files.

For the same test dataset, on cloud I am getting a classification error of ~5%, whereas using the same weights and bias on a locally created network, I am getting an error of ~10%. Can anyone please suggest the reason behind this difference in the error? Is there any detail between Pytorch and CuDNN that I am missing?

**Here is a little Pytorch snippet for creating the network-**

class LeNet(nn.Module):

# network structure

definit(self):

super(LeNet, self).init()

self.conv1 = nn.Conv2d(1, 20, 5)

self.conv2 = nn.Conv2d(20, 50, 5)

self.fc1 = nn.Linear(50x4x4, 500)

self.fc2 = nn.Linear(500, 10)`def forward(self, x): x = F.max_pool2d(self.conv1(x), (2, 2)) x = F.max_pool2d(self.conv2(x), (2, 2)) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.softmax(self.fc2(x)) return x def num_flat_features(self, x): size = x.size()[1:] return np.prod(size)`

**Here is a little Pytorch snippet for saving weights and biases-**

for name, param in net_gpu.named_parameters():

v1 = name.partition(’.’)[0]

v2 = name.partition(’.’)[2]

arr = np.array(param.data.cpu())

f = open(v1 + “_” + v2 + “.txt”, “w”)

for val in arr:

np.savetxt(f, val.flat)

f.close()

**Here is a CuDNN sippet for creating inference network-**

// Conv1 layer

cudnnConvolutionForward(cudnnHandle, &alpha, dataTensor,

data, conv1filterDesc, pconv1, conv1Desc,

conv1algo, workspace, m_workspaceSize, &beta,

conv1Tensor, conv1);

cudnnAddTensor(cudnnHandle, &alpha, conv1BiasTensor,

pconv1bias, &alpha, conv1Tensor, conv1);`// Pool1 layer cudnnPoolingForward(cudnnHandle, poolDesc, &alpha, conv1Tensor, conv1, &beta, pool1Tensor, pool1); // Conv2 layer cudnnConvolutionForward(cudnnHandle, &alpha, pool1Tensor, pool1, conv2filterDesc, pconv2, conv2Desc, conv2algo, workspace, m_workspaceSize, &beta, conv2Tensor, conv2); cudnnAddTensor(cudnnHandle, &alpha, conv2BiasTensor, pconv2bias, &alpha, conv2Tensor, conv2); // Pool2 layer cudnnPoolingForward(cudnnHandle, poolDesc, &alpha, conv2Tensor, conv2, &beta, pool2Tensor, pool2); // FC1 layer // Forward propagate neurons using weights (fc1 = pfc1'*pool2) cublasSgemm(cublasHandle, CUBLAS_OP_T, CUBLAS_OP_N, ref_fc1.outputs, m_batchSize, ref_fc1.inputs, &alpha, pfc1, ref_fc1.inputs, pool2, ref_fc1.inputs, &beta, fc1, ref_fc1.outputs); // Add bias using GEMM's "beta" (fc1 += pfc1bias*1_vec') cublasSgemm(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N, ref_fc1.outputs, m_batchSize, 1, &alpha, pfc1bias, ref_fc1.outputs, onevec, 1, &alpha, fc1, ref_fc1.outputs); // ReLU activation cudnnActivationForward(cudnnHandle, fc1Activation, &alpha, fc1Tensor, fc1, &beta, fc1Tensor, fc1relu); // FC2 layer // Forward propagate neurons using weights (fc2 = pfc2'*fc1relu) cublasSgemm(cublasHandle, CUBLAS_OP_T, CUBLAS_OP_N, ref_fc2.outputs, m_batchSize, ref_fc2.inputs, &alpha, pfc2, ref_fc2.inputs, fc1relu, ref_fc2.inputs, &beta, fc2, ref_fc2.outputs); // Add bias using GEMM's "beta" (fc2 += pfc2bias*1_vec') cublasSgemm(cublasHandle, CUBLAS_OP_N, CUBLAS_OP_N, ref_fc2.outputs, m_batchSize, 1, &alpha, pfc2bias, ref_fc2.outputs, onevec, 1, &alpha, fc2, ref_fc2.outputs); // Softmax loss cudnnSoftmaxForward(cudnnHandle, CUDNN_SOFTMAX_ACCURATE, CUDNN_SOFTMAX_MODE_CHANNEL, &alpha, fc2Tensor, fc2, &beta, fc2Tensor, result);`

Any suggestion on this will be highly appreciated. Thanks!