In this paper: https://arxiv.org/pdf/2004.02967.pdf the authors use evolutionary algorithms to find better activation-normalization layers.
They provide this tensorflow pseudocode in the appendix (page 11)
def evonorm_b0(x, gamma, beta, nonlinearity, training):
v = trainable_variable_ones(shape=gamma.shape)
_, batch_std = batch_mean_and_std(x, training)
den = tf.maximum(batch_std, v * x + instance_std(x))
return x / den * gamma + beta
return x * gamma + beta
# Helper functions
def instance_std(x, eps=1e−5):
_, var = tf.nn.moments(x, axes=[1, 2], keepdims=True)
return tf.sqrt(var + eps)
def group_std(x, groups=32, eps=1e−5):
N, H, W, C = x.shape
x = tf.reshape(x, [N, H, W, groups, C // groups])
_, var = tf.nn.moments(x, [1, 2, 4], keepdims=True)
return tf.reshape(tf.sqrt(var + eps), [N, H, W, C])
def trainable_variable_ones(shape, name="v"):
name, shape=shape, initializer=tf.ones_initializer()
What would be the pytorch equivalent for the
evonorm_b0 (and helper functions) ? I want to make sure I don’t mess my implementation.
Skimming through the code it seems that
tf.nn.moments can be replaced by
torch.var, while other functions can be mapped to the
torch. namespace without name changes.
Could you post your current approach and explain, where you are stuck at the moment?
There is one implementation of evonorm-s0 here: https://gist.github.com/kashif/ff44b17a6da18ec5128678d100c3818f. I adapted this one to 1d and it seems to work fine (at least the shapes are correct).
Here is my version:
import torch.nn as nn
import torch.nn.functional as F
__constants__ = ['num_features', 'eps', 'nonlinearity']
def __init__(self, num_features, eps=1e-5, nonlinearity=True):
self.num_features = num_features
self.eps = eps
self.nonlinearity = nonlinearity
self.weight = nn.Parameter(torch.Tensor(1, num_features, 1))
self.bias = nn.Parameter(torch.Tensor(1, num_features, 1))
self.v = nn.Parameter(torch.Tensor(1, num_features, 1))
def group_std(self, x, groups=8):
N, C, H = x.shape
x = torch.reshape(x, (N, groups, C // groups, H))
std = torch.std(x, 3, keepdim=True)
return torch.reshape(std + self.eps, (N, C, 1))
def forward(self, x):
num = x * F.sigmoid(self.v * x)
return num / self.group_std(x) * self.weight + self.bias
return x * self.weight + self.bias
But I am interested in the batch version.
What I have trouble with is handling the
training flag, and
register_buffers to keep running mean and running std. The specific part I’m not sure about is how to update those values.
Looks like someone implemented it: https://github.com/digantamisra98/EvoNorm. I haven’t tried it yet, I’ll post when I have.