I’m performing a classification task with time series data. Therefore, I designed an 1DCNN-LSTM model. Currently, 1d-batch normalization layers are applied for CNN part, but I’m not sure to use layer normalization for RNN part.
So, my question is, batch norm layers and layer norm layers can be used simultaneously in a single network?
I don’t think batchnorm should be used with variable-length time series data(u can google for why, and I’ve had bad experience in cases very similar to yours). Using layernorm for LSTM should improve the performance in most cases(sometimes not very significant, though), but u must use it correctly, see https://github.com/pytorch/pytorch/blob/master/benchmarks/fastrnns/custom_lstms.py.
Then, you mean 1-D batchnorm layer may impair the performance? I’ve searched for google, but cannot find the reasons, so can you let me know the reasons or some links?
Thanks for your reply.
The following is only my opinion, which is developed from various articles I saw on the internet and my speculations, so please correct me if u find a better explanation:
If the batch size is 1, batch norm is bad because batch norm requires a relative big batch size to be able to function well.
If the batch size is bigger, there should be some padding values for sure, and batch norm will take that into account, which will probably degrade the performance.