I am training a DNN in PyTorch with 4 dense layers with ReLU as activation function, batch norm and dropout (0.7) in every hidden layer. I was wondering if it is necessary to save and load the SGD (learning_rate = 0.001, momentum = 0.9) optimizer state for resuming training? Also, what if I only want to train the output layer after loading?
Also, I am using 1-CCC (Concordance Correlation Coefficient: Concordance correlation coefficient - Wikipedia) as loss function. The data I am using is several clips from the same audios, the training, validation and test set all contain different audios from different speakers. My question is if it matters whether I should shuffle all the sets so that sentences are not ordered, shuffle only the train set or just don’t need to shuffle them. I’m not sure if it might affect the results.
I would generally recommend saving/loading the
state_dict of the optimizer as a good practice even if it might not be needed for specific use cases. In your use case, the
momentum_buffers would be filled since you are using
momentum=0.9 so restoring it would thus also be needed.
Thank you. About the loss function part. Do you know whether it is needed to shuffle all the data, just training or none of it? I am afraid that as the sentences that are continuous in time have similar labels, not getting them in order might affect to the purpose of using the 1-ccc loss.
Sorry, but I’m not deeply familiar with the loss function and your use case.
Based on this it seems you have split the datasets also based on the speakers and I would assume you want to keep it as it fits your use case best (I thus also assume that only new speakers will be used during the model deployment for inference).
In this case you could try to shuffle the training dataset as usually shuffling helps in the model training.
However, I don’t know if your data contains some temporal dependency and shuffling could also break it.
Ok. Thank you for your help! I’ll post a new question asking for this specifically.