Hello,
I am currently trying to train a model which includes among other modules the torch.nn.Bilinear one :
self.batch_size = 1
input_len = 100
output_len = 2
self.bilinear = nn.Bilinear(
input_len,
input_len,
output_len,
bias=False,
)
self.left_indices = [
index
for index in range(config["max_seq_len"])
for _ in range(config["max_seq_len"] - index)
]
self.right_indices = [
higher_index
for index in range(config["max_seq_len"])
for higher_index in range(index, config["max_seq_len"])
]
Which I use this way during training :
...
output_tensor = torch.zeros(
self.batch_size,
self.max_seq_len,
self.max_seq_len,
self.output_len,
device=used_device,
)
output_tensor[
:, self.left_indices, self.right_indices
] = self.bilinear(
input_left[:, self.left_indices],
input_right[:, self.right_indices],
)
....
However I hit the traditional “CUDA out of memory.” (which seems to happen when processing a second batch)
I tried varying the size of the two input tensors (from 1 to 100) and the resource allocated to the GPU takes up to 8GB with a Batch size of 1.
Is this normal or am I doing something wrong ? (assigning the module output to a tensor like that ?)
Thank you for your help ! =)