I am writing a script to register two images using PyTorch and I test my code on both V100 and P100 GPU. It turns out P100 (running time is about 30 seconds ) is much slower than V100 (2-3 seconds). Usually V100 gives a 2x or 3x speedup over the P100. Does anyone has an opinion on what causes this large difference?
Environment:
Ubuntu: 16.04
PyTorch: 1.1.0
CUDA: 9.2
My code is like:
fixed = torch.from_numpy(fixed).float().cuda()
moving = torch.from_numpy(moving).float().cuda()
theta = torch.eye(3,4, requires_grad=Ture).unsqueeze(0).float().cuda()
for i in range(max_iteration):
grid = torcn.nn.functional.affine_grid(theta, fixed.size())
output = torch.nn.functional.grid_sample(moving, grid)
optim.zero_grad()
loss = loss_fn(fixed, output)
loss.backward()
optim.step()