hi, i have
class ToyNetwork(nn.Module):
def __init__(self, embedding_dim, hidden_dim):
super(ToyNetwork, self).__init__()
self.lstm1 = nn.LSTM(embedding_dim, hidden_dim, 70)
self.lstm2 = nn.LSTM(embedding_dim, hidden_dim, 70)
def forward(self, inputs, hidden1, hidden2):
start = time.monotonic()
self.lstm1(inputs, hidden1)
torch.cuda.synchronize()
mid = time.monotonic()
self.lstm2(inputs, hidden2)
torch.cuda.synchronize()
print(f"""second network time: {time.monotonic()-mid}, first network time: {mid-start}""")
second network time: 0.010582592338323593, first network time: 0.02081933245062828
im wondering why there are a lot of differences between time taken by first lstm network vs second lstm network
thank you!