Hi, I’m trying to understand the CUDA implementation and how to increase performance of the neural network but I’m facing the following issue and I will like any guidance on the topic. I’m performing a very simplistic forward pass for a random tensor (code attached). However, I’m getting better timing using the CPU when compared with the GPU (a result I didn’t expected)
import random
from time import time
import torch
class SmallModel(torch.nn.Module):
def __init__(self, in_f) -> None:
super().__init__()
self.cnn = torch.nn.Sequential(
torch.nn.Linear(in_f, 200),
torch.nn.ReLU(),
torch.nn.Linear(200,200),
torch.nn.ReLU(),
torch.nn.Linear(200,200),
torch.nn.ReLU(),
torch.nn.Linear(200,100)
)
def forward(self, x:torch.Tensor)->torch.Tensor:
print(x.device)
return self.cnn(x)
device = 'cuda'
a = torch.randn((100,100))
a = a.to(device)
m = SmallModel(100, device)
m = m.to(device)
start_time = time()
b = m(a)
print(f"Total time: {(time() - start_time)*1000} ms with device {device}")
The timing for this evaluation is:
with CUDA: 398 ms
with CPU: 1.50 ms
The specifications of my computer are:
GPU: Nvidia Geforce 1660 TI 6GB with CUDA 11.7
CPU: AMD Ryzen 7 2700X 8 core
Memory: 24 GB
Pytorch version: 1.13.0+cu117
I will sincerely appreciate any hint on this topic.