Does feeding input one by one affect gru output

I’am currently implementing scheduled sampling using GRU.

My old version decoder input is (batch_size, max_length, emb_size).

In order to implement scheduled sampling, I change my input into (batch_size, 1, emb_size) once at a time. Concatenating each output to get final output.

At first my expectation is both version should output the exact same outcome.

However, I discovered that the output differs.

I’ve clear all other factor that could affect this outcome, such as dropout.

I’m eager to find out is it my problem or GRU is meant to be perform differently.

My code looks like this.

image

It turns out that after I rewrote the same code in a simpler fashion. The problem is solved.

Thanks for your time.