Sorry I haven’t replied back yet. Was busy trying to optimise what I have.
Some first empirical observations:
Transfer learning does work. However it wasn’t clear as to how it works. Many blog posts and examples I’ve seen on PyTorch forums say you have to freeze the conv nets. This simply didn’t work for me, mostly due to the fact that all those nets have been trained on ImageNet. What does indeed work and reduces training time significantly, is using a pretrained network (on Imagenet) and re-training it for facial expressions. I imagine a network pretrained on facial recognition would train even faster.
I got 72% top-1 with ResNet34. ResNet101 did not perform as well, and alexnet or VGG11/19 were slightly worse. They all got trained using same hyperparams, same dataset sizes multiple times. There seems to be a trade-off, how much data you are going to train on, and how much you are trying to squeeze out of it. Not sure exactly how that works, but I will be doing more work on this.
I cannot get decent enough top-1 accuracy when using emotion labels. I am using valence (a score of negative, neutral or positive emotions) which atm is 72% accurate. I’ve tested it with a webcam and it appears to work, although it does somewhat fluctuate up and down.
I’m going to try using face landmarks with a deep (not convolutional) network once I have the time to try it, because Benski’s results are very promising. I’m also thinking of trying other datasets since I’m under the impression AffectNet is very noisy or has wrong labels.
Thanks to all those who have helped so far, this topic here will be updated as I progress!