Concatenating images

I want to write a code in by Pytorch that concatenate two images (32 by 32), in the way the output image becomes (64 by 32), how should I do that?

Thank you~

1 Like

torch.cat should do the job:

a = torch.randn(3, 32, 32)
b = torch.randn(3, 32, 32)
c = torch.cat((a, b), 1)
print(c.shape)
> torch.Size([3, 64, 32])
3 Likes

Thank you for your reply. I saw the torch.cat but donā€™t know how I can apply it here:

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root=ā€™./dataā€™, train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=5,
shuffle=True, num_workers=1)

classes = (ā€˜planeā€™, ā€˜carā€™, ā€˜birdā€™, ā€˜catā€™,ā€˜deerā€™, ā€˜dogā€™, ā€˜frogā€™, ā€˜horseā€™, ā€˜shipā€™, ā€˜truckā€™)

for epoch in range(3):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
#Final_input=image 64 by 32

You are currently using a batch size of 5, which wonā€™t work if you would like to concatenate two images.
If you use an even batch size, you could concatenate the images using this code:

inputs = torch.cat((inputs[::2], inputs[1::2]), 2)

Since you are using shuffle=True, I assume the pairs used to create the larger tensors do not matter. Is this correct or would you like to concatenate specific pairs of image tensors?

Thank you very much! Itā€™s working. No, I donā€™t consider specific pairs.
If I want to concatenate labels at the same time, this is correct?

labels = torch.cat((labels, labels))

Not really, as your samples not contain more than a single class and you are now dealing with a multi-label classification use case.
In this setup, you could create multi-hot targets and change the criterion to e.g. nn.BCEWithLogitsLoss:

one_hot_labels = torch.nn.functional.one_hot(labels)
one_hot_labels = one_hot_labels[::2] | one_hot_labels[1::2]
one_hot_labels = one_hot_labels.float()

I tried that, itā€™s getting me this error:

AttributeError: module ā€˜torch.nn.functionalā€™ has no attribute ā€˜one_hotā€™

the torch version is 1.0.1.post2

Iā€™m using 1.0.0.dev20190312. You could use the nightly build or if you prefer to stay in 1.0.1post2, you could use this code to create the one-hot encoded targets:

one_hot_labels = torch.zeros(labels.size(0), nb_classes, dtype=torch.long).scatter_(1, labels.unsqueeze(1), 1)

Thank you! Thatā€™s work, but I realized the ā€œone_hot_labelsā€ at the end of the loop has a size [2,10], itā€™s possible that has size [2,10] inside the loop?
If I want to concatenate more than two images, which parts of inputs and one_hot_labels should be changed?

A more general approach could use torch.split and a better .scatter_ call:

nb_cat = 3  # Number of images to concatenate

for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    batch_size = inputs.size(0)
    split_size = batch_size // nb_cat
    inputs = torch.cat(inputs.split(split_size), 2)

    labels = torch.stack(labels.split(nb_cat))
    one_hot_labels = torch.zeros(inputs.size(0), nb_classes, dtype=torch.long).scatter_(1, labels, 1)
    one_hot_labels = one_hot_labels.float()

EDIT: Always assuming the batch size if divisible by nb_cat without a remainder.
I havenā€™t tested edge cases!

nb_cat = 3 # Number of images to concatenate
nb_classes=10

for epoch in range(2):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
batch_size = inputs.size(0)
split_size = batch_size // nb_cat
inputs = torch.cat(inputs.split(split_size), 2)
labels = torch.stack(labels.split(nb_cat))
one_hot_labels = torch.zeros(inputs.size(0), nb_classes, dtype=torch.long).scatter_(1, labels, 1)
one_hot_labels = one_hot_labels.float()

this gives me this error:
RuntimeError: split_size can only be 0 if dimension size is 0, but got dimension size of 2

It seems your batch size is smaller than nb_cat, thus the split_size is 0.
This might be the case if you set a smaller batch_size in your DataLoader or alternatively the last batch in the training loop might be smaller, if the number of samples is not divisible by the batch size without a remainder. You could avoid this issue if you set drop_last=True in your DataLoader (or handle the smaller batch in another way).

I added drop_last=True and itā€™s working now, for batch_size=6, nb_cat=2, nb_classes=10, epoch=3:
print(inputs.shape): torch.Size([3, 3, 64, 32])
print(one_hot_labels.shape): torch.Size([3, 10])

here (in size) the first element (3) is split_size, am I right? Is there any ways that be dependent on batch_size, like Size([6, 3, 64, 32]) or Size([6, 10])?

when I use these inputs and one_hot_labels in my code, in the calculating loss part, shows me this error:

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 ā€˜targetā€™

this is because of float in one_hot_labels? how can I solve it?

If you concatenate the images, youā€™ll get ā€œlessā€ samples, so Iā€™m not sure how you would like to keep the batch size as 6.
Could you explain your use case a bit?

Since you are now dealing with multi-hot encoded targets (i.e. multi-label classification), you could use nn.BCELoss or nn.BCEWithLogitsLoss instead.

I think you are right regarding batch size.
Thank you very much for your help, by using this new loss itā€™s working now.

Hello, how can convert 16 images with size 128 by 128 into a 512 by 512 image?
That is, 4 in width and 4 in length

You could use view or F.fold to create the larger output tensor (depending how the current tensor is shaped one or the other might be easier).

1 Like

thanks for your response
If in this case, i want to enter 2 images with size of 512 by 512 in network each time and each image has a different label. Should give batch_size = 32 ?
I am a beginner. excuse me

If you convert the 16 images into one large image, I would assume that this large image corresponds to a single target. Could you explain your use case a bit, i.e. how the targets are defined and why you would like to reshape the images?