Hello everyone,
while using pytorch’s fasterrcnn_resnet50_fpn I noticed that after passing a list of images from resnet’s backbone there is a time interval (e.g. for a batch of 8 images its ~0.22 sec) where any following tensor gpu operation will have to wait in order to be completed. All tensors and models are on the GPU. Here is code for reproduction:
from torchvision.models.detection import fasterrcnn_resnet50_fpn
import torch
import time
_backbone = fasterrcnn_resnet50_fpn(pretrained=True).cuda()
for name, param in _backbone.named_parameters():
if name.startswith('backbone') or name.startswith('roi_heads.box_head'):
param.requires_grad = False
backbone = _backbone.backbone
transform = _backbone.transform
images = [torch.rand(3, 768, 1024) for _ in range(8)]
foo = torch.tensor([0, 1, 2, 3]).cuda()
images, _ = transform(images, None)
base_features = backbone(images.tensors.cuda());
t = time.time()
# torch.max(foo) # max has no problem
torch.unique(foo) #unique has
print(time.time()-t)
t = time.time()
torch.unique(foo)
print(time.time()-t)
The output of this code for the setup described below is:
0.22019076347351074
0.0008745193481445312
Setup:
python: 3.7
torch: 1.7.0
torchvision: 0.8.1
cudatoolkit: 10.1
GPU: nvidia 20180Ti
Ubuntu: 18.4.4 LTS
I also tried on 1080Ti and TitanX and with previous python/pytorch/torchvision releases having almost same results.
Thanks in advance!