I am currently trying to train a model which includes among other modules the torch.nn.Bilinear one :
self.batch_size = 1 input_len = 100 output_len = 2 self.bilinear = nn.Bilinear( input_len, input_len, output_len, bias=False, ) self.left_indices = [ index for index in range(config["max_seq_len"]) for _ in range(config["max_seq_len"] - index) ] self.right_indices = [ higher_index for index in range(config["max_seq_len"]) for higher_index in range(index, config["max_seq_len"]) ]
Which I use this way during training :
... output_tensor = torch.zeros( self.batch_size, self.max_seq_len, self.max_seq_len, self.output_len, device=used_device, ) output_tensor[ :, self.left_indices, self.right_indices ] = self.bilinear( input_left[:, self.left_indices], input_right[:, self.right_indices], ) ....
However I hit the traditional “CUDA out of memory.” (which seems to happen when processing a second batch)
I tried varying the size of the two input tensors (from 1 to 100) and the resource allocated to the GPU takes up to 8GB with a Batch size of 1.
Is this normal or am I doing something wrong ? (assigning the module output to a tensor like that ?)
Thank you for your help ! =)