Using Histopathological Image Dataset with Ground Truth


I have a dataset which consists of 58 images(RGB format, rowcol3) and with their correspondant ground truth images (binary, row*col). This means every pixel in image has three color features and one class feature which is 1 or 0 to show belongings to a tumor or not.

I’ve looked at lots of tutorials but most of them are unfortunately doesn’t fit my problem. Like MNIST dataset, they have row*col features and just one class info instead of a pixel based ground truth images.

How can I load these images to perform neural network or any kind of deep learning algorithms? Any help would be appreciated.

You can treat this use case as a binary classification use case and would just have to make sure the output of your model and the corresponding target have the right shape.

E.g. instead of using conv and pooling layers at the beginning of your model, then flattening the activations, and passing them to linear layers, you could write a model using only conv layers, so that the spatial size of the activations stays constant.

Here is a simple example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 3, 1, 1)
        self.conv2 = nn.Conv2d(6, 1, 3, 1, 1)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.conv2(x)
        return x

model = MyModel()

x = torch.randn(10, 3, 24, 24)
y = torch.randint(0, 2, (10, 1, 24, 24)).float()
dataset = TensorDataset(x, y)
loader = DataLoader(

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

nb_epochs = 10
for epoch in range(nb_epochs):
    for data, target in loader:
        output = model(data)
        loss = criterion(output, target)
    print('Epoch {}, loss {}'.format(epoch, loss.item()))

Also, have a look at this post for some information about how to apply the same random transformations on your input image and mask.

To expand on Piotr’s suggestion. The most common type of network used for this is U-Net, but you’ll probably have some difficulty to train that with the small size of your dataset unless you find some auxiliary data (need not necessarily be labeled, but from a similar domain).

Best regards