Hi, sorry for the naive questions, I’m just started learning.
I have a custom dataset to load image and its label, which is used with DataLoader with shuffle=True. My questions are:
-
When DataLoader shuffle the batches, do they shuffle both images and labels or just images? Because the answer here https://stackoverflow.com/questions/65402802/pytorch-shuffle-dataloader?rq=1 is saying that only the images are shuffled, not the label.
-
I need to accumulate the prediction output across the whole epoch to compute some specific scores of that epoch. My code is as below, I use accuracy here for simplicity. Is this correct to simply accumulate prediction output across all batches of train_loader and compute the scores at the end of the epoch?
Thank you
## Custom dataset class
class MyDataset(Dataset):
def __init__(self, label_csv):
self.label_df = pd.read_csv(label_csv) # <img_id>, <label>
def __len__(self):
return len(self.label_df)
def __getitem__(self, idx):
img_id, label = self.label_df.iloc[idx]
img = read_and_preprocess_image(img_id)
return img, label
## Create datasets and dataloaders
training_data = MyDataset(train_label_csv)
train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True, num_workers=8)
validation_data = MyDataset(val_label_csv)
val_dataloader = DataLoader(validation_data, batch_size=batch_size, shuffle=True, num_workers=8)
## Loop through all epochs
for epoch in range(num_epoch):
running_loss = 0.0
pred_epoch = []
label_epoch = []
for inputs, labels in train_loader:
inputs = inputs.to(device)
labels = labels.to(device)
# Zero the parameter gradients
optimizer.zero_grad()
# Make prediction
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
# Compute loss
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
# Accumulate labels
running_loss += loss.detach() * inputs.size(0)
pred_epoch.extend(preds.tolist())
label_epoch.extend(labels.tolist())
epoch_loss = running_loss / train_size
epoch_correct = sum(pred_epoch[i] == label_epoch[i] for i in range(len(pred_epoch)))
epoch_acc = epoch_correct / train_size