Hi!
So I have a question, I have this recursive algorithm to train multiple Neural Networks in different domains, but same architecture , every time you go deeper into the algorithm the less training data you have, meaning it should be faster training the deeper you go.
I tested my algorithm with a small(2000) training set to be sure that it worked. And it does work! But now i want to try it with more real world training set, with size 262144(512x512). Since I knew I was going to work with the GPU so I tried to carefully move variables to the GPU only if I needed it.
When I ran it, I saw that the first network I trained(with the biggest training set of 512x512 samples) trains quite fast it takes 130s. However, next one that is going to be trained using only 60% of the data to train it, takes significantly way more(I have taken the time to see how much it actually takes because it so long). But if I test the network using that 60% of the data to train it, it takes approximately 50s.
Now that you have the context my question is the following. Is there any way that the first network I trained slows the second one?
I will show you the NN and the my training functions:
class MLPflat(nn.Module):
def __init__(self,in_dim: int, out_dim: int, N, H):
super().__init__()
assert(N > 0)
assert(H > 0)
net = [nn.Linear(in_dim, H),nn.BatchNorm1d(H), nn.LeakyReLU()]
for _ in range(N-1): # make N layers
net += [nn.Linear(H, H),nn.BatchNorm1d(H), nn.LeakyReLU()]
net += [nn.Linear(H,out_dim,bias=False)]
self.model = nn.Sequential(*net)
def forward(self, x):
x = self.model(x)
output = x
return output
I use N(hidden layers) = 5 and H = 64.
def train(phi,train_loader,epochs,criterion,optimizer):
fit_start_time = time.time()
for epoch in range(epochs):
batch = 0
for x_batch, y_batch in train_loader:
optimizer.zero_grad()
print(x_batch.shape)
y_pred = phi(x_batch.to(device))
loss = criterion(y_pred.squeeze(), y_batch.to(device).squeeze())
loss.backward(retain_graph=True)# I retain the graph because I get an error if I don't do it.
optimizer.step()
batch+=1
fit_end_time = time.time()
print("Total time = %f" % (fit_end_time - fit_start_time))
I tried to do is to get rid any tensor inside of the GPU I don’t need. However it didn’t seem to improve.
I think that this problem is not really apparent in with really small training sets, but only comes when I train it with big amounts of training samples.
I would love to hear if this problem sounds like anything you have encounter before. If this is not enough let me know and I could elaborate more on my algorithm.