Custom Implementation of Batch normalization

I am updating the BatchNormalizationUpdateOutputInference_kernel by adding some custom arguments for the new version.
The set of files that are updated are as follows -

  1. ./aten/src/THNN/generic/BatchNormalization.c
  2. ./aten/src/THCUNN/BatchNormalization.cu
  3. ./aten/src/THCUNN/generic/BatchNormalization.cu
  4. ./torch/legacy/nn/BatchNormalization.py
  5. ./aten/src/THNN/generic/THNN.h
  6. ./aten/src/THCUNN/generic/THCUNN.h

But stills the python setup.py build fails by

In file, ./pytorch/aten/src/ATen/nn_parse.py

RuntimeError: BatchNormalization_updateOutput: can’t find binding

./aten/src/ATen/nn.yaml is also updated. But still the error is
ten/build/src/ATen/ATen/CUDADoubleType.cpp: In member function ‘virtual std::tuple<at::Tensor, at::Tensor, at::Tensor> at::CUDADoubleType::thnn_batch_norm_forward(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, bool, double, double) const’: