I am new to pytorch. I am trying to create a new activation layer, let’s call it topk, that would work as follows. It will take a vector x of size n as input (result of multiplying previous layer output by weight matrix and adding bias) and a positive integer k and would output a vector topk(x) of size n whose elements are
(topk(x))_i =x_i if x_i is one of the top k elements of x, 0 otherwise.
While calculating gradient of topk(x), top k elements of x should have gradient 1, everything else 0.
How should I implement this? Can you please provide some sample code?