Auto regressive transformer decoeder

Has anyone tried to implement auto regressive transformer decoder without teacher forcing?

Hey buddy, have you got any updates or progress on this?

Seems like all the popular implementation I could find are all using teacher forcing.