This is my collate function for the data loader:
device = "cuda"
def collate_fn(batch):
x = torch.stack([torch.from_numpy(item[:-1]) for item in batch])
y = torch.stack([torch.from_numpy(item[1:]) for item in batch])
if device == "cuda":
return x.to(device, non_blocking=True), y.to(device, non_blocking=True)
else:
return x.to(device), y.to(device)
And Initialised the data loader with pin_memory=True
:
train_dataloader = DataLoader(train_dataset, batch_sampler=train_batch_sampler, collate_fn=collate_fn, pin_memory=True)
it gives me RuntimeError: cannot pin 'torch.cuda.IntTensor' only dense CPU tensors can be pinned
this error, after i found out itâs copying the host data to the pinned array after collate_fn returns the tensors which is moved to gpu already.
And i changed the collate fn to pin the array to cpu and then move the tensor to device and set pin_memory to False.
device = "cuda"
def collate_fn(batch):
x = torch.stack([torch.from_numpy(item[:-1]) for item in batch])
y = torch.stack([torch.from_numpy(item[1:]) for item in batch])
if device == "cuda":
return x.pin_memory().to(device, non_blocking=True), y.pin_memory().to(device, non_blocking=True)
else:
return x.to(device), y.to(device)
train_dataloader = DataLoader(train_dataset, batch_sampler=train_batch_sampler, collate_fn=collate_fn, pin_memory=False)
And I checked if it works by
x, y = next(train_loader)
print(x.is_pinned())
print(x.device)
but it gives me this
False
cuda:0
What I expected:
True
cuda:0
it doesnât pin the tensor but moved the tensor to cuda device. i wanted it to first allocate a host array and transfer the data from the pinned array to device memory. Is there anything im doing wrong? or is there anything i should do to accomplish that?