Hi,

I have a rather unique situation I believe (I searched online for hours but couldn’t find a solution).

I have a network with 2 types of input:

One is an image (say, 254x254), and one is a pair of integers (coordinates in the image).

Some background:

I’d like to train a binary classification network with the inputs being an image tensor and a coordinates tensor. The way the network is currently designed, the image goes through some convolutions and some other “heavy-lifting” layers, while the coordinates goes through some shallow network and finally everything goes through a few fully connected layers.

The problem:

I would like to refrain from having my different image tensors uploaded to the GPU as a part of a batch and discarded afterwards, even though we only used them to train one (image, coordinates) pair. That just takes too long. I already tried sampling a subset of all of the possible 254*254 possible coordinates to reduce training time, but I still can’t find a way to solve the problem mentioned above. I would like to have something like this:

Each batch of N samples is of the form IMAGE_K, coordinate_i where IMAGE_K is a shared between all the coordinates of the batch, and coordinate_i is the i’th coordinate where 0 <= i <= N. In other words, I only load one image from disk and only one image occupies the GPU, while many many different coordinates are used in the training batch.

Obviously I don’t really care if it’s a single image in a batch or 2 or 3, but my point is that I don’t want every training sample to be some random image+coordinate, because it makes no sense. Another way of thinking about it, in pseudo-code, is:

for every Image:

for every Coordinate in sample:

train on (Image, Coordinate)

I also thought about using a hypernetwork to solve it - one network, F, is fed with an image and produces weights for a network G, where G is fed with a coordinate and, using the F’s generated weights, predict 0 or 1. The problem was that implementing it was a nightmare, especially handling batches. Couldn’t get the thing working.

I’m open to any suggestion, I’m pretty lost.

Thanks