Normally I move all of my data to the GPU before training, but I have a dataset that is too big for the GPU’s memory (it easily fits into system RAM). So rather than moving the entire dataset to the GPU, I changed by code to only move the mini batches to the GPU as needed. I’ve done some testing with datasets that fit entirely on the GPU, and I find that moving the mini batches to the GPU as needed greatly increases training time (about 1.5x for my data). I thought that if I moved larger chucks of data to the GPU at a time and then split the chucks into mini batches, it might help. However, this seems to be twice as slow as just moving the mini batches as needed. Any suggestions on a better way to move data to the GPU would be appreciated.
Original Approach:
X = torch.from_numpy(X).cuda()
y = torch.from_numpy(y).cuda()
tensor_dataset = torch.utils.data.TensorDataset(X, y)
train_loader = torch.utils.data.DataLoader(tensor_dataset, 256, shuffle=True)
for epoch in range(1, n_epochs+1):
for batch, (data, target) in enumerate(train_loader, 1):
model.train()
optimizer.zero_grad()
outputs = model.forward(data)
loss = loss_function(outputs, target)
loss.backward()
optimizer.step()
Mini Batch Approach:
X = torch.from_numpy(X)
y = torch.from_numpy(y)
tensor_dataset = torch.utils.data.TensorDataset(X, y)
train_loader = torch.utils.data.DataLoader(tensor_dataset, 256, shuffle=True)
for epoch in range(1, n_epochs+1):
for batch, (data, target) in enumerate(train_loader, 1):
data = data.cuda()
target = target.cuda()
model.train()
optimizer.zero_grad()
outputs = model.forward(data)
loss = loss_function(outputs, target)
loss.backward()
Chunked Approach:
X = torch.from_numpy(X)
y = torch.from_numpy(y)
tensor_dataset = torch.utils.data.TensorDataset(X, y)
chuck_loader = torch.utils.data.DataLoader(tensor_dataset, 100000, shuffle=True)
for epoch in range(1, n_epochs+1):
for chunk_data, chunk_target in chuck_loader:
chunk_data = chunk_data.cuda()
chunk_target = chunk_target.cuda()
tmp_tensor_dataset = torch.utils.data.TensorDataset(chuck_data, chuck_target)
train_loader = torch.utils.data.DataLoader(tmp_tensor_dataset, 256, shuffle=False)
for batch, (data, target) in enumerate(train_loader, 1):
model.train()
optimizer.zero_grad()
outputs = model.forward(data)
loss = loss_function(outputs, target)
loss.backward()
optimizer.step()