Let tensor A has gradient information. If I run “B[:A.size()[0]] = A”, does B[:A.size()[0]] contains the same gradient information as A?
I am working on implement bidirectional GRU layer from scratch. I want it can take PackedSequence as input x. To deal with the variable batch_size for each step, I write the following code for reverseRNN layer:
for batch_size in reversed(batch_sizes):
step_state = global_state[:batch_size]
step_input = input[input_offset - batch_size : input_offset]
input_offset -= batch_size
out, step_state = self.cell(step_input, step_state)
outputs = [out] + outputs
global_state[:batch_size] = step_state
The global_state
stores states’ situation for all sequences. And I generate PackedSequence using pack_sequence with the default setting.
My idea is if global_state can always get up-to-date gradient information from the current iteration’s step_state, the backpropagation should work correctly.