How to correctly Implement Bayes by Backprop Layers with efficientNet?

Hey guys,

I wanted to run a few experiments on a Bayesian Network trained via Blundells Bayes by Backprop method, which he described in the paper " Weight Uncertainty in Neural Networks" against Gals “Dropout as a Bayesian Approximation” and see which one gave better results.

The way I’m doing this is by using the efficientNet library provided here: GitHub - lukemelas/EfficientNet-PyTorch: A PyTorch implementation of EfficientNet and then extracting the output and putting that through 2 layers (one 512, the second 128) and working out loss with cross entropy loss.

This works fine when training the standard model with standard linear layers, but when I try to push my output through my bayesian layers I use the loss function defined in Blundells paper. My question is: How do I train efficientNet as well as my Bayesian Layers? Do I simply run the output through the network and calculate the variational free energy (the compression cost to be minimized) for my Bayesian layers and then add the cross entropy loss ontop of that? Here is a code sample of how I’m currently doing it:

    def extract_efficientNet(self, input):
        output = self.efficientNet_model.extract_features(input)
        output = self.pool(output)
        output = output.view(output.shape[0], -1)

        return output

    def sample_elbo(self, input, target, samples=10, n_classes=8):
        num_batches = 555
        batch_size = input.size()[0]
        input = extract_efficientNet(input)

        # make empty tensors to hold outputs
        outputs = torch.zeros(samples, batch_size, n_classes).to(self.device)
        log_priors = torch.zeros(samples).to(self.device)
        log_variational_posteriors = torch.zeros(samples).to(self.device)

        # Run multiple forward passes through the Bayesian layers
        for i in range(samples):
            outputs[i] = self.bayesian_sample(input)

            # Stuff happens when I call bayesian_sample to calculate these
            log_priors[i] = self.log_prior()
            log_variational_posteriors[i] = self.log_variational_posterior()

        log_prior = log_priors.mean()
        log_variational_posterior = log_variational_posteriors.mean()
        outputs = outputs.mean(0)

        # Used once to calculate BbB loss, and then again for efficientNet loss
        negative_log_likelihood = TF.cross_entropy(outputs, target, weight=self.class_weights, reduction='sum')

        KL_divergence = (log_variational_posterior - log_prior)
        #BbB loss
        BBB_loss = KL_divergence / num_batches + negative_log_likelihood

        #EfficientNet Loss
        efficientNet_loss = TF.cross_entropy(outputs, target, weight=self.class_weights)

       #Is this correct? Do I add these two together here to train both my weight distributions and the regular efficientNet??
       loss = efficientNet_loss + BBB_loss 

This method seems to work but is training very slowly, was wondering if I was doing this correctly?
Any information would be greatly appreciated, there are a couple other threads about using multiple loss functions but I just want to make sure this is correct in this somewhat unique situation.

Not sure this repo could help you or not.