The performance is highly dependent on weight initialization

Hi everyone. I am working on a project now which is about matching video and audio features, i.e. to tell whether a pair of specific video and audio feature are from the same video. The training set contains only 1150 pairs of features and the testing set contains 150. Now I am able to get high accuracy sometimes, but it’s not stable, i.e. it is highly dependent on weight initialization and I have to run the training process several times with exactly the same model and hyper-parameters, e.g. learning rate and decay, to get the highest accuracy. I would like to know that whether this is common and whether this is just because the dataset lacks enough samples. And also what can I do to make the performance more stable under this situation? I don’t know whether a specific weight initialization will help. Thanks a lot !!