Hi, I am a new guy with pytorch. Recently, I implemented SDNE model in pytorch. But I got some weired results.

The input is one row of a graph’s adjacent list. Here, I only post the model and the strange results.

```
import torch
import torch.sparse as ts
import torch.nn as nn
import numpy as np
import time
class SDNE(nn.Module):
def __init__(self, num_units, k, d):
super(SDNE, self).__init__()
# auto_encoder
auto_encoder = list()
auto_encoder.append(nn.Linear(num_units[0], num_units[1]))
for i in np.arange(1, k):
auto_encoder.append(nn.Linear(num_units[i], num_units[i+1]))
auto_encoder.append(nn.Linear(num_units[k], d))
self.auto_encoder = nn.Sequential(*auto_encoder)
# auto_decoder
auto_decoder = list()
auto_decoder.append(nn.Linear(d, num_units[k]))
for i in np.arange(0, k):
auto_decoder.append(nn.Linear(num_units[k - i], num_units[k-i-1]))
self.auto_encoder = nn.Sequential(*auto_encoder)
self.auto_decoder = nn.Sequential(*auto_decoder)
def forward(self, x):
start = time.time()
y = self.auto_encoder(x)
torch.cuda.synchronize()
end_time = time.time()
print("encoder time : " + str(time.time() - start))
x_hat = self.auto_decoder(y)
torch.cuda.synchronize()
print("decoder time: " + str(time.time() - end_time))
return y
```

This model has 5 layers. The feature number of each layer is 4841716 , 48, 20, 48, 4841716.

When I give some inputs which is dense tensor and about 5 million elements, the result is :

```
encoder time : 0.14040541648864746
decoder time: 0.007679462432861328
encoder time : 0.042407989501953125
decoder time: 0.004119396209716797
backward time = 0.11002826690673828
encoder time : 0.037224769592285156
decoder time: 0.00419163703918457
encoder time : 0.0371246337890625
decoder time: 0.004086017608642578
backward time = 0.1155850887298584
encoder time : 0.037232398986816406
decoder time: 0.004139423370361328
encoder time : 0.03712105751037598
decoder time: 0.0040395259857177734
backward time = 0.11581850051879883
```

Then I give an input which are sparse and has only 100 non-zero elements, but the size is as same as the former. Which got the followed results:

```
encoder time is 0.113931
decoder time is 0.001139
encoder time is 0.046264
decoder time is 0.000139
```

My questions are :

- Why the first round of training always consumed more tine thant the rest?
- Using sparse should has better performance because it could save more unnessesary operations. But the reuslts didn’t show that. Why?

THANKS if anyone can reply to me.