I want to create a neural network layer such that the neurons in this layer are not fully connected to the neurons in layer below.
For example, there are two adjacent neuron layers with 1000 neurons and 300 neurons. Lets name the first layer A and the second layer B. The output of layer A serves as the input of layer B. The neurons 1:3 in layer B are connected to neurons 1:10 in layer A, neurons 4:6 in layer B are connected to neurons 11:20 in layer A, and so on.
Now, since the two layers are partially connected. It is possible to compute the output of every 3 neurons in layer B in parallel on the GPU. I could compute the output of neuron 1:3, 4:6, 7:9, … in parallel. This would be straightforward to do in CUDA C code.
Is this possible in pytorch using any of the existing functions or by creating a new C function. Mainly, I want to utilize the capabilities of the autograd package to avoid having to compute gradients myself and not loose any improvements in speed. Also, I would like to have a code that can utilize more than 1 GPU (if they are available to me).
Thank you for any guidance.
Sorry, I had intermittent internet which resulted in multiple postings. Is it possible to delete the other post?