Hello everyone, I am currently facing a problem regarding a small GPU memory during my deep learning project. To handle this, I am currently training in batch size =4 but this requires a significant sampling from the initial data to be able to fit into my GPU. Hence, I think I have to use batch size = 1 which is a stochastic gd. However, I have read some posts saying that batch normalization is not good to be used in batch size =1. If it is true, what should I do with the BN in my model? Do I have to remove them?