Tile aggregation by averaging output probability when using inception V3

I am using Inception V3 pre-trained on ImageNet. I am following this architecture for the training and I am doing tile-based training.

After I saved the trained model, I load it to do aggregation of tiles prediction based on the average of output probabilities. Each WSI initially is chopped into a lot of tiles and each tile gets the weak label of the WSI.

I would like to know what “tile aggregation by averaging output probability” exactly mean and also how to get the output probability for each of the test data points? when I pass an image from the test to the trained model, I get two values not a probability.

and the architecture is:

You can download the full paper here:

Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning

Here’s my test code:

test_large_images = {}
test_loss = 0.0
test_acc = 0


with torch.no_grad():
    
    test_running_loss = 0.0
    test_running_corrects = 0
    print(len(dataloaders_dict['test']))
    for i, (inputs, labels) in enumerate(dataloaders_dict['test']):
    
        print(i)
        test_input = inputs.to(device)
        test_label = labels.to(device)
        test_output = saved_model_ft(test_input)
        _, test_pred = torch.max(test_output, 1)
        print('test pred: ', test_output)
        sample_fname, label = dataloaders_dict['test'].dataset.samples[i]
        patch_name = sample_fname.split('/')[-1]
        large_image_name = patch_name.split('_')[0]
        if large_image_name not in test_large_images.keys():
            test_large_images[large_image_name] = list()
            test_large_images[large_image_name].append(test_pred.item())
        else:
            test_large_images[large_image_name].append(test_pred.item())
        test_running_corrects += torch.sum(test_pred == test_label.data)
    
    test_acc = test_running_corrects / len(dataloaders_dict['test'].dataset)

print(test_acc)

For example, here are some of the results:

0
test pred:  tensor([[ 1.9513, -2.4072]], device='cuda:2')
1
test pred:  tensor([[ 1.0274, -1.0467]], device='cuda:2')
2
test pred:  tensor([[ 0.6868, -0.8948]], device='cuda:2')
3
test pred:  tensor([[ 0.8908, -1.1201]], device='cuda:2')
4
test pred:  tensor([[ 0.7935, -0.9384]], device='cuda:2')
5
test pred:  tensor([[ 1.1609, -1.3650]], device='cuda:2')
6

so, how do I get the output probability? and how does tile aggregation by averaging output probability exactly work in this scenario.

About probabilities, I found this piece of code, but it still gives two values:
probability = torch.nn.functional.softmax(test_output[0], dim=0)

Is that what we are looking for or just the highest value between the two?

probability:  tensor([0.9785, 0.0215], device='cuda:2')
test output:  tensor([[ 1.6664, -2.1500]], device='cuda:2')

So, going back to the method in the paper, for

test_input = inputs.to(device)
test_label = labels.to(device)
test_output = saved_model_ft(test_input)
probabilities = torch.nn.functional.softmax(test_output[0], dim=0)
print('probabilities: ', probabilities)
probability = torch.max(torch.nn.functional.softmax(test_output[0], dim=0))
print('probability: ', probability)
_, test_pred = torch.max(test_output, 1)
print('test output: ', test_output)
print('test pred: ', test_pred)

should I do the torch.max on probability vector and save a probability value for each tile? this is for one tile in test set:

probabilities:  tensor([0.8992, 0.1008], device='cuda:2')
probability:  tensor(0.8992, device='cuda:2')
test output:  tensor([[ 1.0471, -1.1416]], device='cuda:2')
test pred:  tensor([0], device='cuda:2')

also, please assume we have the following results for two tiles in test set that both belong to the same WSI and that WSI presumably has two tiles.

if I have this for one test data point

probabilities:  tensor([0.8992, 0.1008], device='cuda:2')
probability:  tensor(0.8992, device='cuda:2')
test output:  tensor([[ 1.0471, -1.1416]], device='cuda:2')
test pred:  tensor([0], device='cuda:2')

and this for another test datapoint:

probabilities:  tensor([0.7603, 0.2397], device='cuda:2')
probability:  tensor(0.7603, device='cuda:2')
test output:  tensor([[ 0.4782, -0.6760]], device='cuda:2')
test pred:  tensor([0], device='cuda:2')

How do you do the tile aggregation by averaging the output probability here?
using (0.8992 * 0 + 0.7603 * 0) / 2 ?

If this is the case, my concern is since majority of data is class 0, this would result with 0 as the label as well.

I did the following, but it seems I am getting very small probabilities since only a small chunk of larger images tiles are predicted as 1. Please let me know if this makes any sense?

image_sum_ones = []
image_num_tiles = []

for key, value in test_large_images.items():
    sum_probs = 0
    image_sum_ones.append(np.sum(value['pred']))
    image_num_tiles.append(len(value['pred']))
    for i in range(len(value['pred'])):
        if value['pred'][i] == 1:
            sum_probs += value['pred_probability'][i]
    sum_probs /= len(value['pred'])
    print("probability of label 1: ", sum_probs)
            
    
print(sorted(image_sum_ones))
print(sorted(image_num_tiles))
print(image_sum_ones)
print(image_num_tiles)

I get this output:

probability of label 1:  0.09891807485384992
probability of label 1:  0.07276203761998791
probability of label 1:  0.08117216583844777
probability of label 1:  0.14768244485769952
probability of label 1:  0.08982645347714424
probability of label 1:  0.09776548073116668
probability of label 1:  0.08362754519659145
probability of label 1:  0.07905217825918269
probability of label 1:  0.08063625618322007
probability of label 1:  0.0783113234621637
probability of label 1:  0.08250170438056677
probability of label 1:  0.07820508241653443
probability of label 1:  0.09131980835753284
probability of label 1:  0.07993868380616613
probability of label 1:  0.09600267582332965
probability of label 1:  0.12047059082787884
probability of label 1:  0.09631681207161215
probability of label 1:  0.0773005544403453
probability of label 1:  0.11777349737253082
probability of label 1:  0.08263663419277326
probability of label 1:  0.02820326413138438
probability of label 1:  0.09092985522949089
probability of label 1:  0.08395318535790927
probability of label 1:  0.08291178072492282
probability of label 1:  0.06710066102998226
[3, 5, 16, 17, 18, 20, 24, 25, 25, 26, 27, 28, 30, 31, 31, 35, 35, 36, 39, 45, 46, 65, 70, 71, 209]
[32, 59, 89, 112, 118, 121, 125, 151, 177, 177, 192, 194, 207, 229, 243, 262, 266, 272, 283, 296, 377, 431, 467, 481, 1509]
[46, 45, 39, 28, 5, 27, 36, 35, 20, 35, 65, 16, 71, 24, 31, 25, 70, 31, 17, 209, 3, 18, 30, 26, 25]
[283, 377, 296, 112, 32, 177, 262, 266, 151, 272, 481, 125, 467, 177, 194, 121, 431, 243, 89, 1509, 59, 118, 207, 192, 229]