Making a custom activation function more memory-efficient

I have the following function, which quantizes an input according to a monotonic step function to the values: [-0.1, -0.05, 0.0, 0.05, 0.1]. In the backward step I just pretend that the forward function was the identity (thus giving back the same gradient).

class Quantize(torch.autograd.Function):

    @staticmethod
    def forward(ctx, input):
        input[0.075 < input] = 0.1
        input[(0.025 < input) & (input <= 0.075)] = 0.05
        input[(-0.025 < input) & (input <= 0.025)] = 0.0
        input[(-0.075 < input) & (input <= -0.025)] = -0.05
        input[input <= -0.075] = -0.1
        return input

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output

Unfortunately, i get an out-of-memory error on the GPU when using this function. Before resorting to reducing the batch size I’d like to see if approaches exist to reduce the memory footprint of this function.

  • Is there even a built-in function that does the above quantization?
  • Are there some built-in functions that could be used to achieve the functionality of above?
  • Is there a more memory-efficient approach to achieve the assignment of above?
  • Is there some way to define a function that maps over each entry of a tensor? (E.g., like PyTorch’s .exp, .softmax or similar)
  • Or is there some completely different API to write a custom “Activation”-Function instead of a Function?

Looping over the input is not an option - that takes way too long. The best would be if I could parallelize the operation like a ‘map’.