Hi, I have a question about memory cost in training and validation.
First, I meet following problems.
Node A has output of
- shape <max_length, batch_size, dim> during training
- shape <max_valid_length, valid_batch_size, dim> during validation
After validation there is additional memory allocated, which could cause out-of-memory error.
When I force max_length * batch_size > max_valid_length * valid_batch_size, no additional memory is created.
So my assumption is
- During training, output buffer is created for Node A, which has shape <max_length, batch_size, dim>
- Size of the buffer will never change until it encounters a tensor which exceeds space that already allocated.
- During validation, the same buffer is used to store output tensor, if it <max_valid_length, valid_batch_size, dim> exceeds pre-allocated space <max_length, batch_size, dim>, then the buffer has to allocate additional space.
I wondering whether this assumption is correct?