So lets say I have a regularisation term which varies with input. But since PyTorch does things in a batch how to communicate to PyTorch we want the regularisation for a single example and not between all entries of the matrix.

For example, if we have an **input dependent** co-adaptation regularizer but now if we pass this as a scalar over batch_size * (2D reg_cost__matrix), how do I ensure that the co-adaptation is reduced in the 2D matrix (i.e **intra matrix** optimisation) and not between the 2D matrices of the batch size (i.e **inter matrix** optimisation).