If I assign tensor A to a part of tensor B, will that part of tensor B contains the same gradient information as tensor A?

Let tensor A has gradient information. If I run “B[:A.size()[0]] = A”, does B[:A.size()[0]] contains the same gradient information as A?

I am working on implement bidirectional GRU layer from scratch. I want it can take PackedSequence as input x. To deal with the variable batch_size for each step, I write the following code for reverseRNN layer:

    for batch_size in reversed(batch_sizes):
        step_state = global_state[:batch_size]
        step_input = input[input_offset - batch_size : input_offset]
        input_offset -= batch_size
        out, step_state = self.cell(step_input, step_state)
        outputs = [out] + outputs
        global_state[:batch_size] = step_state

The global_state stores states’ situation for all sequences. And I generate PackedSequence using pack_sequence with the default setting.

My idea is if global_state can always get up-to-date gradient information from the current iteration’s step_state, the backpropagation should work correctly.


No it won’t. It will contain the gradient information associated to how you use this new copy.

Why is this global_state necessary here? Why not just pass step_state to the next iteration?

Because, in a batch, there is no guarantee that all sequence are the same length.

I’m confused about your example actually.
I’m not sure to understand why there is a global state that is associated with different samples in your batch. Why would the ith entry in one batch should be associated with the ith entry in another batch (through your use of the global_state)?