Training a Network With a Constant Input and a Variable Input

Hi,

I have a rather unique situation I believe (I searched online for hours but couldn’t find a solution).

I have a network with 2 types of input:
One is an image (say, 254x254), and one is a pair of integers (coordinates in the image).

Some background:
I’d like to train a binary classification network with the inputs being an image tensor and a coordinates tensor. The way the network is currently designed, the image goes through some convolutions and some other “heavy-lifting” layers, while the coordinates goes through some shallow network and finally everything goes through a few fully connected layers.

The problem:
I would like to refrain from having my different image tensors uploaded to the GPU as a part of a batch and discarded afterwards, even though we only used them to train one (image, coordinates) pair. That just takes too long. I already tried sampling a subset of all of the possible 254*254 possible coordinates to reduce training time, but I still can’t find a way to solve the problem mentioned above. I would like to have something like this:
Each batch of N samples is of the form IMAGE_K, coordinate_i where IMAGE_K is a shared between all the coordinates of the batch, and coordinate_i is the i’th coordinate where 0 <= i <= N. In other words, I only load one image from disk and only one image occupies the GPU, while many many different coordinates are used in the training batch.

Obviously I don’t really care if it’s a single image in a batch or 2 or 3, but my point is that I don’t want every training sample to be some random image+coordinate, because it makes no sense. Another way of thinking about it, in pseudo-code, is:

for every Image:
for every Coordinate in sample:
train on (Image, Coordinate)

I also thought about using a hypernetwork to solve it - one network, F, is fed with an image and produces weights for a network G, where G is fed with a coordinate and, using the F’s generated weights, predict 0 or 1. The problem was that implementing it was a nightmare, especially handling batches. Couldn’t get the thing working.

I’m open to any suggestion, I’m pretty lost.

Thanks

Would splitting the Datasetsinto anImageDatasetandCoordDataset` work?
You could then use the nested loop approach and keep the image constant in the inner loop.
Pseudo code for the idea:

image_loader = DataLoader(image_dataset)
coord_loader = DataLoader(coord_dataset)

for image in image_loader:
    # if the coordinates depend on the current batch, initialize the coord_loader here
    image = image.to('cuda')
    for coord in coord_loader:
        optimizer.zero_grad()
        coord = coord.to('cuda')
        output = model(image, coord)
        loss = ...

Let me know, if this approach would work or if I misunderstood the use case.

2 Likes

Hi, sorry for the delayed response, we had a holiday here :slight_smile:
That’s a really elegant and simple solution to my problem I wish I have thought about, I’m a bit embarrassed haha. Will give it a try and let you know how it went, thank you so much for helping me out!

Okay, that seems to do that trick. Thanks again @ptrblck