Speeding up the training

I am trying an autoencoder with specific connections.
My input matrix is 24,000 x 65,000
my middle layer has 4000 nodes, but the first to the second layer are not fully connected. The connection is specified based on some prior knowledge.

    self.layer1.weight.data = self.layer1.weight.data.mul(self.mask)
    z = self.relu(self.dropout(self.NEL1(self.EL1(x))))

I am already using a GPU for training, but still, the training is very slow. I was wondering if anyone has any suggestions on how I can speed it up?