Expected Result of SWA in Low Precision Image Classification

Hi All,

I am trying to implement SWA for low precision learning, very similar to what has been done in SWALP with BFP. However, since SWA is now already available in pytorch, I won’t have to make it from scratch, so I used AveragedModel, along with SWALR scheduler.
However, as I am a bit of a beginner, I am having some doubts which I was hoping someone could clarify:

(a) Can I use the existing SWA functions in pytorch for low precision use cases too, or should I go for something that has been done in the SWALP repo (linked above)?

(b) I incorporated SWA into my Qpytorch code almost exactly the same way as the example code given by the author of SWA in this repo , but am having a bit of a hard time understanding if SWA is making much difference in my case. Could you please tell me what to check to understand if its doing much improvements? I am training VGG on CIFAR10 but can see that final test accuracy values are not signficantly higher with SWA than without as I was expecting.
So I am trying to view the weights of the AveragedModel after update_parameters to see what SWA is doing. For testing purposes, I have a single fully connected layer network and an arbitrary 3x3 input matrix. I am a bit unsure as to how exactly to hand calculate the answers for weight updates using SWA so was hoping someone could guide me on how I could test my code to know that I have used SWA correctly.

I realise my questions are a bit general and maybe has obvious answers but I am having a bit of a hard time finding relevant answers so would really appreciate any help. Thank you!