Always getting the same numbers in output

Hi everyone!
I am really new to deep learning. I’m currently working with data set from kaggle competition (https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i/overview) where I try to predict lat and long of taxis with neural network.
But I can’t really figure out what is going wrong in my model since I always get the same values for output, although losses are descending.

Here is my code.

I preprocess data as follows:

zf = zipfile.ZipFile('train.csv.zip')
df = pd.read_csv(zf.open('train.csv'), converters={'POLYLINE': lambda x: json.loads(x)[-1:]})
df1 = df.dropna(subset = ['POLYLINE'])
df1 =df1.fillna(1)
df_train, df_test = train_test_split(df1, train_size = 0.1, random_state=0)

df_train['CALL_TYPE'] = pd.factorize(df_train['CALL_TYPE'])[0]
df_train['DAY_TYPE'] = pd.factorize(df_train['DAY_TYPE'])[0]
df_train['MISSING_DATA'] = pd.factorize(df_train['MISSING_DATA'])[0]
df_train['TAXI_ID'] = pd.factorize(df_train['TAXI_ID'])[0]

df_test['CALL_TYPE'] = pd.factorize(df_test['CALL_TYPE'])[0]
df_test['DAY_TYPE'] = pd.factorize(df_test['DAY_TYPE'])[0]
df_test['MISSING_DATA'] = pd.factorize(df_test['MISSING_DATA'])[0]
df_test['TAXI_ID'] = pd.factorize(df_test['TAXI_ID'])[0]

features = df_train.drop('POLYLINE', axis = 1).values #this is final train set
target = df_train['POLYLINE'].values

max_length = max(len(row) for row in target)
max_cols = max([len(row) for batch in target for row in batch])
max_rows = max([len(batch) for batch in target])
padded = [batch + [[0] * (max_cols)] * (max_rows - len(batch)) for batch in target]
padded = torch.tensor([row + [0] * (max_cols - len(row)) for batch in padded for row in batch])
padded = padded.view(-1, max_rows, max_cols)

lat = padded[:,0][:,0]
long = padded[:,0][:,1]
targ = pd.DataFrame() #this is final target variable
targ['lat']  = lat
targ['long'] = long

X = torch.from_numpy(features.astype(np.float32))
y = torch.from_numpy(np.array(targ).astype(np.float32))

Then I build model the following way:

torch.manual_seed(1)
x, y = Variable(X), Variable(y)

net = torch.nn.Sequential(
torch.nn.Linear(8, 20),
torch.nn.LeakyReLU(),
torch.nn.Linear(20, 10),
torch.nn.LeakyReLU(),
torch.nn.Linear(10, 2),
)

optimizer = torch.optim.Adadelta(net.parameters(), lr=0.01)
loss_func = torch.nn.SmoothL1Loss()

BATCH_SIZE = 64
EPOCH = 100

torch_dataset = Data.TensorDataset(x, y)

loader = Data.DataLoader(
dataset=torch_dataset,
batch_size=BATCH_SIZE,
shuffle=True, num_workers=2,)

for epoch in range(EPOCH):
for step, (batch_x, batch_y) in enumerate(loader): 

    b_x = Variable(batch_x)
    b_y = Variable(batch_y)
    prediction = net(b_x) 
    loss = loss_func(prediction, b_y)

    optimizer.zero_grad() 
    loss.backward() 
    optimizer.step() 
if(epoch % 10 == 0):
    print('epoch {}, loss {}'.format(epoch, loss.data))

Will really appreciate any help!

What values does your model predict?
I’ve seen some issues in the past, where the model predicted only the mean value(s) of the target and thus didn’t learn to generalize.

If that’s the case, I would recommend to try to overfit a small data sample first and make sure your current architecture is able to learn it.

PS: Variables are deprecated since PyTorch 0.4.0, so you can use tensors in newer versions.

Thanks for the reply!
I tried with the following example:

inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')
X = torch.from_numpy(inputs)
y = torch.from_numpy(targets)

class MyModel(nn.Module):
    def __init__(self,input_dim,h,output_dim):
        super(MyModel,self).__init__()
        self.linear1 = nn.Linear(input_dim,h)
        self.act1 = nn.ReLU()
        self.linear2 = nn.Linear(h,output_dim)
    
    def forward(self, x):
        x = (self.linear1(x))
        x = (self.act1(x))
        x = (self.linear2(x))
        return x

input_dim = 3
h =10
output_dim = 2
model = MyModel(input_dim,h,output_dim)

mse = nn.MSELoss()
learning_rate = 0.005
optimizer = torch.optim.Adam(model.parameters(),lr = learning_rate)

loss_list = []
iteration_number = 100
for iteration in range(iteration_number):
    results = model(X)
    print(results)
    loss = mse(results, y)
    loss.backward()
    optimizer.zero_grad()
    optimizer.step()
    
    loss_list.append(loss.data)
if(iteration % 100 == 0):
    print('epoch {}, loss {}'.format(iteration, loss.data))

plt.plot(range(iteration_number),loss_list)
plt.xlabel("Number of Iterations")
plt.ylabel("Loss")
plt.show()

It appears my model doesn’t really learn since I get something like this all the time.

tensor([[-3.4838,  2.7143],
        [-3.8473,  4.5520],
        [-1.1620,  7.0547],
        [-7.8916, -2.1579],
        [-0.9408,  6.9475]]

Loss function doesn’t change as well.
Though, can’t get what is going wrong in the model.

You are zeroing out the gradients before the step() operation.
Move optimizer.zero_grad() before the loss.backward() call or after optimizer.step() and your model is able to overfit the sample.

That solved the problem! Thank you very much!