Mix/Match GRU Memory and Outputs between steps?

I’m trying to build a GRU model that handles interactions between multiple entities. The GOAL is that the GRU_INPUT will be the the PREVIOUS_GRU_OUTPUT of the “Right” entity, while the GRU_MEMORY_INPUT will be the PREVIOUS_MEMORY for the “LEFT” entity.
For example:
t0: A itersects with B
t1: C --> D
t2: A --> E
t3: B --> D

t99: A --> C

To attempt to handle the above, I have a single GRU model and at each time-step, the inputs I’m using are:
t0 (A->B)= INPUT_IN = random_array | MEMORY_IN = random_array (random, since no history)
t1 (C->D) = same as no history for either
t2 (A->E)= INPUT_IN = random_array (no E history) | MEMORY_IN = MEMORY_OUT from t0
t3 (B->D)= INPUT_IN = GRU_OUT from t1 | MEMORY_IN = MEMORY_OUT from t1

I’m currently holding the GRU_OUT and MEMORY_OUT Variables in a dictionary of arrays as such:

ent_hist = {entity_id1: {'out': [first_out, second_out, ...], 'mem': [first_mem, second_mem,...]}, entitiy_id2:...} 

And at each time step, I check to see if I have any entity history, and if so, I pull the last array for each:

mem_in = ent_hist[tgt_ent_id]['mem][-1]
gru_in = ent_hist[other_ent_id]['out'][-1]

At t99, the outputs from the final step are then passed through a simple decoder and I use a BCELoss function as the loss. The issue I’m facing is that the gradients do not appear to be backpropagating through the GRU. Any thoughts or hints would be greatly appreciated!

Thanks!