I have questions about Dropout.
Should I use different dropout mask when forwarding batch data?
That is, should I generate different dropout mask for each mini-batch data at the same layer?
Should the dropout mask when doing backpropagation be the same as the dropout mask when doing forward processing?
That is, should I save the dropout masks when forwarding and apply the same mask to the corresponding layer when doing backpropagation?
Should I not use dropout to the output layer?
Should I not use dropout to the input?
Thank you in advance and have a nice weekend