Model returns wrong amount of predictions

Hello,

when I’m trying to return predicted values with following code:

preds = y_val.to('cpu').data
prediction = torch.max(preds, 1)[1].numpy()

this only returns 7 predicted labels, i don’t get why exactly 7, when there should be almost 6000 labels. How to return the full amount of predicted labels?

What’s the shape of preds? Do preds.shape

Hey, that’s the size of preds

torch.Size([7, 3])

Yeah, size() and shape are interchangeable.

Whatever the issue is further up the line. Specifically, whatever is producing y_val. There’s not enough information here to go off of. Could you post a snippet of the code that produces y_val?

Here is the evaluation loop:

test_losses = []
with torch.no_grad():
    model.eval()
    for i,(data, label) in enumerate(testLoader):
        data, label = data.to(device), label.to(device)
        y_val = model(data)
        loss = loss_f(y_val, label)
        test_losses.append(loss.to('cpu').data.numpy())
    print(f'Loss: {np.mean(test_losses):.8f}')
print('done')

and here how I get ‘data’ for ‘y_val’:

train_x, val_x, train_y, val_y = train_test_split(dataset_x, dataset_y, test_size=0.3,
                                                  random_state=0,stratify=dataset_y)


class_sample_count = np.array([len(np.where(train_y==t)[0]) for t in np.unique(train_y)])
weight = 1. / class_sample_count
samples_weight = np.array([weight[t] for t in train_y])

samples_weight = torch.from_numpy(samples_weight)
sampler = torch.utils.data.sampler.WeightedRandomSampler(samples_weight.type('torch.DoubleTensor'), len(samples_weight))

trainDataset = torch.utils.data.TensorDataset(torch.FloatTensor(train_x), torch.LongTensor(train_y.astype(int)))
validDataset = torch.utils.data.TensorDataset(torch.FloatTensor(val_x), torch.LongTensor(val_y.astype(int)))

trainLoader = torch.utils.data.DataLoader(dataset = trainDataset, batch_size=bs, num_workers=1, sampler = sampler)
testLoader = torch.utils.data.DataLoader(dataset = validDataset, batch_size=bs, shuffle=False, num_workers=1) 

I thought first it was because of random sampler, but i didn’t applied it on validation set.

Update: label data is separated in several columns (multiple amount of tasks), so I’m kind of trying to learn my model on several tasks by saving and reusing weights from previously trained tasks. Still can’t get why I get only 7 of them predicted, if somebody see what’s missing - willl highly appreciate any help!

Update #2: Now it is pretty straightforward, i realized that I outputted just a batch of 16 instances from whole data set, but still don’t understand how to receive array of all the predictions.

Update #3: Solved the issue. So to output full amount of batches from model, i created a list in the beginning of a function and added every batch to list, afterwards i have extracted values from tensors in the list and got a ‘ndarray’ type of output.

def evaluate(model, loss_f, testLoader=testLoader):
    test_correct = 0
    test_total = 0
    predos = []
    test_losses = []
    model.eval()
    with torch.no_grad():
        for data, label in testLoader:
            data, label = data.to(device), label.to(device)
            y_val = model(data)
            loss = loss_f(y_val, label)
            test_losses.append(loss.to('cpu').data.numpy())
            _, test_predicted = torch.max(y_val.data, 1)
            predos.append(test_predicted.to('cpu').data)
            test_total += label.size(0)
            test_correct += (test_predicted == label).sum().item()
        test_accuracy = (test_correct / test_total) * 100
        print(f'Evaluation loss: {np.mean(test_losses):.6f}')
        print("Eval Accuracy: {:.4f}".format(test_accuracy))
    return predos
y_val = evaluate(model, loss_f)

preds = []
for i in y_val:
    i = i.numpy()
    for subi in i:
        preds.append(subi)