I noticed some odd (and crippling) behaviour in the function
torch.nn.utils.rnn.pack_sequence. See my code:
inputs= list(self.buffer.values()) print(torch.cuda.memory_allocated()/1e9) print(torch.cuda.max_memory_allocated()/1e9) inputs = [torch.cat(input, dim=0) for input in inputs] print(torch.cuda.memory_allocated()/1e9) print(torch.cuda.max_memory_allocated()/1e9) inputs, batch_sizes, sorted_indices, unsorted_indices = pack_sequence(inputs, enforce_sorted=False) print(torch.cuda.memory_allocated()/1e9) print(torch.cuda.max_memory_allocated()/1e9)
Code is structured this way to demonstrate my problem. Below you will see 4 examples of what is printed by this sequence of codes. You will see that the memory allocated by the list of tensors I input into
pack_sequence is outstandingly smaller than the maximum allocated memory, which occurs during the packing (logically true as before packing, maximum is much smaller). 4 outputs are a result of different datapoints (achieved by setting different random seeds; am using shuffled data laoders).
0.05142272 0.05142272 0.101368832 0.101368832 0.10063104 3.67196672
0.055465984 0.055465984 0.105938432 0.105938432 0.104708096 0.995354112
0.061822976 0.061822976 0.111620608 0.111620608 0.111021056 1.934670848
0.052079616 0.052079616 0.101535232 0.101535232 0.101254144 0.52041728
Additionally, I would like to note that all tensors inside of
inputsare already stored in my single cuda device.
Could anybody explain what is going on and how I can avoid this? I suspect this is because during padding there is a lot more information stored. If this is the case is there no way this function can be run by sparsifying? I would happily trade memory for speed here (without batchifying this procedure).