This error is arising when I am trying to run “gpytorch.utils.lanczos.lanczos_tridiag” on the GPU.
The error is occurring in this line of code
sim_i_j = torch.diag(sim, self.batch_size)
where sim is a NxN array of cosine similarities between two batches of samples, (N = 2* self.batch_size). So, the (i,j)-th sample of “sim” array is the cosine similarity value between i-th feature vector in the first batch and the j-th feature vector in the second batch.
This line is part of my custom loss function where I am calculating the loss for SimCLR algorithm.
However when I use “device = ‘cpu’” I do not get this error “RuntimeError: Storage size calculation overflowed with sizes=[-96]”
But, the model.forward() is being executed on GPU regardless of what device I use for lanczos_tridiag function.
I set the self.batch_size to 16 down from 128 in this run. The number in in “[ ]” changes when using different batch_sizes.
I do not understand the reason for this error. What is the reason behind this error?