Speed of a self-made algorithm layer in Pytorch

Hello everyone,

I have a special request for the last layer of the encoder.

"The sum of all neurons in this layer must add up to one.

Example: tensor[0 -4 +5] --> 0-4+5 = 1

For this I have called a separate function in the “forward” function, as recommended in another post in this forum, which ensures this.

For this I show you the called function.

# ASC function (enforce, that abundance sum is equal to 1)
def ASC_function(inputTensor):
    outputTensor = inputTensor.clone().cuda()
    for n in range(inputTensor.size(0)):
            for m in range(inputTensor.size(1)):
                # pylint: disable=E1101 # Fehler in VSC muss hier hinzugefügt werden
                outputTensor[n][m] = torch.div(inputTensor[n][m], torch.sum(inputTensor[n]).cuda()).cuda()
                # pylint: enable=E1101 # Fehler in VSC muss hier hinzugefügt werden
    return outputTensor

And my network.

# define the autoencoder network
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        # encoder
        self.ecf_1 = nn.Sequential(
        nn.Linear(in_features=91, out_features=9*3),
        nn.Linear(in_features=9*3, out_features=6*3),
        nn.Linear(in_features=6*3, out_features=3*3),
        nn.Linear(in_features=3*3, out_features=3),
        # encoder
        self.ecf_2 = nn.Dropout(0.5)

        # decoder
        self.dcf_1 = nn.Linear(in_features=3, out_features=91)                    
        self.dcf_2 = nn.Sigmoid()      

    def forward(self, x):
        x = self.ecf_1(x)
        x = ASC_function(x)                                       # Call the ASC (nonnegative abundances constraint) Function
        x = self.ecf_2(x)
        x = self.dcf_1(x)
        y = self.dcf_2(x)
        return y

net = Autoencoder() # runn the Auto Encoder Network from above once

Unfortunately this slows down the training extremely.

Is there a more elegant solution?

I don’t think you need the nested loop, which will slow down your code.
This should also work:

inputTensor = torch.randn(10, 10)
outputTensor = inputTensor.clone()
for n in range(inputTensor.size(0)):
    for m in range(inputTensor.size(1)):
        outputTensor[n][m] = torch.div(inputTensor[n][m], torch.sum(inputTensor[n]))

# without the loop
x = inputTensor.clone()
x = x / x.sum(1, keepdims=True)

print((x == outputTensor).all())
> True


thanks a lot for your help.

I tried your code. The performance has improved.

However, I have found that I can use the Softmax function for this just as well and the result is the same. So now I just use the Softmax function from Pytorch for my problem.

Many greetings

Are you applying torch.exp on the inputs before? If not, dividing by the sum alone shouldn’t yield the same output as the softmax.
However, as I’m not familiar with your use case, it’s good to hear you can use built-in methods. :wink:

No, I had not used the exponential function before.

But I locked the weights for negative values and so I only needed the condition, that the row in the tensor summed must result in 1.

So the Softmax function is perfectly suitable.