EvoNorm-B0 in pytorch

In this paper: https://arxiv.org/pdf/2004.02967.pdf the authors use evolutionary algorithms to find better activation-normalization layers.

They provide this tensorflow pseudocode in the appendix (page 11)

def evonorm_b0(x, gamma, beta, nonlinearity, training):
  if nonlinearity:
    v = trainable_variable_ones(shape=gamma.shape)
    _, batch_std = batch_mean_and_std(x, training)
    den = tf.maximum(batch_std, v * x + instance_std(x))
    return x / den * gamma + beta
  else:
    return x * gamma + beta

# Helper functions
def instance_std(x, eps=1e−5):
  _, var = tf.nn.moments(x, axes=[1, 2], keepdims=True)
  return tf.sqrt(var + eps)

def group_std(x, groups=32, eps=1e−5):
  N, H, W, C = x.shape
  x = tf.reshape(x, [N, H, W, groups, C // groups])
  _, var = tf.nn.moments(x, [1, 2, 4], keepdims=True)
  return tf.reshape(tf.sqrt(var + eps), [N, H, W, C])

def trainable_variable_ones(shape, name="v"):
  return tf.get_variable(
    name, shape=shape, initializer=tf.ones_initializer()
  )

What would be the pytorch equivalent for the evonorm_b0 (and helper functions) ? I want to make sure I don’t mess my implementation.

Skimming through the code it seems that tf.nn.moments can be replaced by torch.var, while other functions can be mapped to the torch. namespace without name changes.

Could you post your current approach and explain, where you are stuck at the moment?

There is one implementation of evonorm-s0 here: https://gist.github.com/kashif/ff44b17a6da18ec5128678d100c3818f. I adapted this one to 1d and it seems to work fine (at least the shapes are correct).

Here is my version:

import torch
import torch.nn as nn
import torch.nn.functional as F


class EvoNorm1ds0(nn.Module):
    __constants__ = ['num_features', 'eps', 'nonlinearity']

    def __init__(self, num_features, eps=1e-5, nonlinearity=True):
        super(EvoNorm1ds0, self).__init__()

        self.num_features = num_features
        self.eps = eps
        self.nonlinearity = nonlinearity

        self.weight = nn.Parameter(torch.Tensor(1, num_features, 1))
        self.bias = nn.Parameter(torch.Tensor(1, num_features, 1))
        if self.nonlinearity:
            self.v = nn.Parameter(torch.Tensor(1, num_features, 1))

        self.reset_parameters()

    def reset_parameters(self):
        nn.init.ones_(self.weight)
        nn.init.zeros_(self.bias)
        if self.nonlinearity:
            nn.init.ones_(self.v)

    def group_std(self, x, groups=8):
        N, C, H = x.shape
        x = torch.reshape(x, (N, groups, C // groups, H))
        std = torch.std(x, 3, keepdim=True)
        return torch.reshape(std + self.eps, (N, C, 1))

    def forward(self, x):
        if self.nonlinearity:
            num = x * F.sigmoid(self.v * x)
            return num / self.group_std(x) * self.weight + self.bias
        else:
            return x * self.weight + self.bias

But I am interested in the batch version.

What I have trouble with is handling the training flag, and register_buffers to keep running mean and running std. The specific part I’m not sure about is how to update those values.

Looks like someone implemented it: https://github.com/digantamisra98/EvoNorm. I haven’t tried it yet, I’ll post when I have.