I trained a small and simple CNN model for image classification using the same PyTorch code in two GPU: Colab Free K80 and Paperspace Gradient P6000.
You can see both codes here (I’ve printed the detail of the GPU I used to make sure):
- Colab K80: Google Colab
- Gradient P6000: Paperspace Console
for convenience I show some parts of the code here:
The Model:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=4, stride=1, padding=0)
self.pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0)
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=0)
self.fc1 = nn.Linear(39200, 512)
self.fc2 = nn.Linear(512, 5)
self.do = nn.Dropout()
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.reshape(x.shape[0], -1)
x = self.do(F.relu(self.fc1(x)))
x = self.fc2(x)
return x
The Training Loop:
import time
s = time.time()
model.train()
for i in range(epoch):
total_loss = 0
total_sample = 0
total_correct = 0
for image, label in trainloaders:
image = image.to('cuda')
label = label.to('cuda')
out = model(image)
loss = criterion(out, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
total_sample += len(label)
total_correct += torch.sum(torch.max(out,1)[1]==label).item()*1.0
print(f"epoch {i} loss:{total_loss/total_sample} acc:{total_correct/total_sample}")
e = time.time() # TRAINING TIME
print(e-s)
as far as I know, P6000 has better performance than K80, but when I measure the model training time using the code above, it shows that K80 only needs ~110s to train the model for 20 epochs, while P6000 needs ~140s (you can see the output in the code above).
I’ve run the code several times, restarted the kernel, or run another day but it always shows similar result. I’ve also tried using torch.cuda.synchronize()
but the result is still the same.
I realize it only happens in my PyTorch Code, when I use Tensorflow, P6000 much faster than K80
Why did it happen?