Mix/Match GRU Memory and Outputs between steps?

I’m trying to build a GRU model that handles interactions between multiple entities. The GOAL is that the GRU_INPUT will be the the PREVIOUS_GRU_OUTPUT of the “Right” entity, while the GRU_MEMORY_INPUT will be the PREVIOUS_MEMORY for the “LEFT” entity.
For example:
t0: A itersects with B
t1: C --> D
t2: A --> E
t3: B --> D

t99: A --> C

To attempt to handle the above, I have a single GRU model and at each time-step, the inputs I’m using are:
t0 (A->B)= INPUT_IN = random_array | MEMORY_IN = random_array (random, since no history)
t1 (C->D) = same as no history for either
t2 (A->E)= INPUT_IN = random_array (no E history) | MEMORY_IN = MEMORY_OUT from t0
t3 (B->D)= INPUT_IN = GRU_OUT from t1 | MEMORY_IN = MEMORY_OUT from t1

I’m currently holding the GRU_OUT and MEMORY_OUT Variables in a dictionary of arrays as such:

ent_hist = {entity_id1: {'out': [first_out, second_out, ...], 'mem': [first_mem, second_mem,...]}, entitiy_id2:...} 

And at each time step, I check to see if I have any entity history, and if so, I pull the last array for each:

mem_in = ent_hist[tgt_ent_id]['mem][-1]
gru_in = ent_hist[other_ent_id]['out'][-1]

At t99, the outputs from the final step are then passed through a simple decoder and I use a BCELoss function as the loss. The issue I’m facing is that the gradients do not appear to be backpropagating through the GRU. Any thoughts or hints would be greatly appreciated!