Hi, a few beginners questions:
Using a single 1080TI GPU
pytorch 0.4
My model is a simple Feedforward net with 5 hidden layers of 100 relu units.
Basically, I have data sets of roughly 50-450KB, a data set is stored on my regular HD as (mat or pt file) where the x’s & y’s are stored as pytorch tensors.
Basically, I’m getting now roughly 3-5% GPU utilization (looking at windows task manager), while dedicated memory GPU occupies around 5GB/11GB.
Now, here is my code:
x_train_org,y_train_org= load(data.pt)
x_train_org = x_train_org[rand_idx, :]
y_train_org= y_train_org[rand_idx, :]
training_data = data_utils.TensorDataset(x_train_org, y_train_org)
train_data_loader = data_utils.DataLoader(training_data, batch_size=batch_size, shuffle=True, pin_memory=True)
for i in range(0, ep_num):
estimator.train()
for x, y in train_data_loader:
x, y = x.to(device), y.to(device)
pred_log_probs = estimator.forward(x)
model_optimizer.zero_grad()
loss1 = cost_func(pred_log_probs.permute([0, 2, 1]), y)
loss1.backward()
model_optimizer.step()
estimator.eval()
pred_log_probs = estimator.forward(x_train_org.to(device))
train_loss[i + 1] = cost_func(pred_log_probs.permute([0, 2, 1]), y_train_org.to(device)).detach().item()
So my questions are:
Is there something wrong with the flow of my code?
Should I use the “.to(device)” directly when feeding the DataLoader with the training data?
Is there any reason for me to really use the DataLoader in case of a single GPU and when all data is already in pytorch tensor data?
Is there anything I can do to speed things up?