I have a ModuleDict
model like the one shown below:
NeuralNetwork(
(linears): ModuleDict(
(C): Sequential(
(0): Linear(in_features=384, out_features=192, bias=True)
(1): ReLU()
(2): Linear(in_features=192, out_features=192, bias=True)
(3): ReLU()
(4): Linear(in_features=192, out_features=96, bias=True)
(5): ReLU()
(6): Linear(in_features=96, out_features=48, bias=True)
(7): ReLU()
(8): Linear(in_features=48, out_features=1, bias=True)
)
(H): Sequential(
(0): Linear(in_features=384, out_features=192, bias=True)
(1): ReLU()
(2): Linear(in_features=192, out_features=192, bias=True)
(3): ReLU()
(4): Linear(in_features=192, out_features=96, bias=True)
(5): ReLU()
(6): Linear(in_features=96, out_features=48, bias=True)
(7): ReLU()
(8): Linear(in_features=48, out_features=1, bias=True)
)
(N): Sequential(
(0): Linear(in_features=384, out_features=192, bias=True)
(1): ReLU()
(2): Linear(in_features=192, out_features=192, bias=True)
(3): ReLU()
(4): Linear(in_features=192, out_features=96, bias=True)
(5): ReLU()
(6): Linear(in_features=96, out_features=48, bias=True)
(7): ReLU()
(8): Linear(in_features=48, out_features=1, bias=True)
)
(O): Sequential(
(0): Linear(in_features=384, out_features=192, bias=True)
(1): ReLU()
(2): Linear(in_features=192, out_features=192, bias=True)
(3): ReLU()
(4): Linear(in_features=192, out_features=96, bias=True)
(5): ReLU()
(6): Linear(in_features=96, out_features=48, bias=True)
(7): ReLU()
(8): Linear(in_features=48, out_features=1, bias=True)
)
(P): Sequential(
(0): Linear(in_features=384, out_features=192, bias=True)
(1): ReLU()
(2): Linear(in_features=192, out_features=192, bias=True)
(3): ReLU()
(4): Linear(in_features=192, out_features=96, bias=True)
(5): ReLU()
(6): Linear(in_features=96, out_features=48, bias=True)
(7): ReLU()
(8): Linear(in_features=48, out_features=1, bias=True)
)
(S): Sequential(
(0): Linear(in_features=384, out_features=192, bias=True)
(1): ReLU()
(2): Linear(in_features=192, out_features=192, bias=True)
(3): ReLU()
(4): Linear(in_features=192, out_features=96, bias=True)
(5): ReLU()
(6): Linear(in_features=96, out_features=48, bias=True)
(7): ReLU()
(8): Linear(in_features=48, out_features=1, bias=True)
)
)
)
The forward that use to train it looks like this:
def forward(self, X, device=None):
"""Forward propagation
This is forward propagation and it returns the atomic energy.
Parameters
----------
X : dict
Dictionary of inputs in the feature space.
Returns
-------
outputs : tensor
A list of tensors with energies per image.
"""
outputs = []
for hash in X:
image = X[hash]
atomic_energies = []
for symbol, x in image:
if isinstance(symbol, bytes):
symbol = symbol.decode("utf-8")
try:
x = self.linears[symbol](x)
except RuntimeError:
x = self.linears[symbol](x.to(device))
intercept_name = "intercept_" + symbol
slope_name = "slope_" + symbol
slope = getattr(self, slope_name)
intercept = getattr(self, intercept_name)
x = (slope * x) + intercept
atomic_energies.append(x)
atomic_energies = torch.cat(atomic_energies)
image_energy = torch.sum(atomic_energies)
outputs.append(image_energy)
outputs = torch.stack(outputs)
return outputs
Running that forward()
in a single CPU takes 1.74 seconds and in a single GPU 38.33 seconds. Why is this difference? I have some hypothesis:
- The structure of
X
should be changed to improve efficiency and avoid thefor
loop. - Movement of tensors to CUDA device should be avoided inside the
forward()
function and instead done only once before calling forward.
I would appreciate any advice. Thanks.