hi, I’m pretty new to pytorch and I am trying to fine tune a BERT model for my purposes.
the problem is that the .to(device)
function is super slow. moving the transformer to the gpu takes 20 minutes.
I found some test code on pytorch github repo
import torch
import torch.nn as nn
import timeit
print("Beginning..")
t0 = timeit.default_timer()
if torch.cuda.is_available():
torch.cuda.manual_seed(2809)
torch.backends.cudnn.deterministic = True
device = torch.device('cuda:0')
ngpus = torch.cuda.device_count()
print("Using {} GPU(s)...".format(ngpus))
print("Setup takes {:.2f}".format(timeit.default_timer()-t0))
t1 = timeit.default_timer()
model = nn.Sequential(
nn.Conv2d(3, 6, 3, 1, 1),
nn.ReLU(),
nn.Conv2d(6, 1, 3, 1, 1)
)
print("Model init takes {:.2f}".format(timeit.default_timer()-t1))
if torch.cuda.is_available():
t2 = timeit.default_timer()
model = model.to(device)
print("Model to device takes {:.2f}".format(timeit.default_timer()-t2))
t3 = timeit.default_timer()
torch.cuda.synchronize()
print("Cuda Synch takes {:.2f}".format(timeit.default_timer()-t3))
print('done')
the output is:
import torch...
Beginning..
Using 1 GPU(s)...
Setup takes 0.00
Model init takes 0.00
Model to device takes 952.94
Cuda Synch takes 0.00
done
this is my environment:
Pytorch version is: 1.7.0
Cuda version is: 10.1
cuDNN version is : 7604
Arch version is : sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37
system information:
os: Windows 10
graphics card: NVIDIA GeForce RTX 3090
processor: AMD Ryzen 9 5900X 12-Core Processor, 3693 Mhz
motherboard: ROG STRIX B550-F GAMING (WI-FI)
memory: 16GB