Let tensor A has gradient information. If I run “B[:A.size()] = A”, does B[:A.size()] contains the same gradient information as A?
I am working on implement bidirectional GRU layer from scratch. I want it can take PackedSequence as input x. To deal with the variable batch_size for each step, I write the following code for reverseRNN layer:
for batch_size in reversed(batch_sizes): step_state = global_state[:batch_size] step_input = input[input_offset - batch_size : input_offset] input_offset -= batch_size out, step_state = self.cell(step_input, step_state) outputs = [out] + outputs global_state[:batch_size] = step_state
global_state stores states’ situation for all sequences. And I generate PackedSequence using pack_sequence with the default setting.
My idea is if global_state can always get up-to-date gradient information from the current iteration’s step_state, the backpropagation should work correctly.