I’ve recently created a custom pytorch nn model that uses a combination of linear and non-linear connections between layers. I am trying to train a vector of weights on this model. Using about 700 weights, 20 layers, and a batch size of 100, the program takes about 11 seconds to evaluate a single batch while training. For 10 weights, 20 layers, and a batch size of 100, the time for a single batch is about 6 seconds. I find it weird that for 10 weights, the program doesn’t run faster. Is this a sign that I am doing something wrong, or does this seem feasible?

For reference, I am using the weights to create a symetric toeplitz matrix. In both cases, the Toeplitz matrices are about 700 by 700, but in the case where I use 10 weights, only the first 10 diagonals are filled in this matrix.