Hello all.
I recently noticed the len(dataloader)
is not the same as len(dataloader.dataset)
based on Udacity Pytorch course, I tried to calculate accuracy with the following lines of codes :
accuracy=0
for imgs, labels in dataloader_test:
preds = model(imgs)
values, indexes = preds.topk(k=1, dim=1)
result = (indexes == labels).float()
accuracy += torch.mean(result)
print(f'acc_val = {accuracy / len(dataloader_test)}'
For the record, Udacity wrote this :
test_loss = 0
accuracy = 0
# Turn off gradients for validation, saves memory and computations
with torch.no_grad():
for images, labels in testloader:
log_ps = model(images)
test_loss += criterion(log_ps, labels)
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
train_losses.append(running_loss/len(trainloader))
test_losses.append(test_loss/len(testloader))
print("Epoch: {}/{}.. ".format(e+1, epochs),
"Training Loss: {:.3f}.. ".format(running_loss/len(trainloader)),
"Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),
"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
and as you can see below, the validation accuracy is reported like this:
"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
Solen(testloader)
must match the whole testset
. also, two lines above it in :
test_losses.append(test_loss/len(testloader))
its dividing the loss by the len(testloader),
so it should be equal to the whole test set size otherwise it doesnt make sense!
In my case, it only prints 313 as my dataloader_test
.
and my dataloader_test
is defined as follows :
dataset_train = datasets.MNIST(root='MNIST', train=True, transform=transformations, download=True)
dataset_test = datasets.MNIST(root='MNIST', train=False, transform=transformations, download=True)
import torch.utils.data as data
dataloader_train = data.DataLoader(dataset_train, batch_size=32, shuffle=True, num_workers=2)
dataloader_test = data.DataLoader(dataset_test, batch_size=32,shuffle=False,num_workers=2)
print(f'test dataloader size: {len(dataloader_test)}')
So what am I missing here? why am I getting 313 for len(dataloader_test)
while I shoud be getting 10K for MNIST test set?