I’am currently implementing scheduled sampling using GRU.
My old version decoder input is (batch_size, max_length, emb_size).
In order to implement scheduled sampling, I change my input into (batch_size, 1, emb_size) once at a time. Concatenating each output to get final output.
At first my expectation is both version should output the exact same outcome.
However, I discovered that the output differs.
I’ve clear all other factor that could affect this outcome, such as dropout.
I’m eager to find out is it my problem or GRU is meant to be perform differently.