I want to perform dropout at the character level, meaning replacing some chars with the zero vector. Should i do this manually before I convert the 1 hot embeddings to char embeddings, or use
where inp has size [max_sent_len x batch_size x vocab_size]
and inp is the one hot encoding of the char vocabulary.
For the last one, for p=0.5, I’ve seen it converts the 1 flag in the one hot encoding to 2 (scaling each value by 1/p) for the values it doesnt zero. How is it affecting the behavior in the end?
Which one is the correct way?