Pytorch results differs a lot from keras

I am a beginner in Pytorch. I am trying to implement a network in pytorch that I had built using keras earlier. Iam not sure what’s wrong in my implementation.

train_length–>25000
train_vecs --> (25000,10,10) ; train_labels -->(25000,1)

Here is my code:

train_tensor = torch.from_numpy(train_vecs)
train_labels_tensor = torch.from_numpy(train_labels)

class FirstNN(nn.Module):
def init(self):
super(FirstNN,self).init()
self.conv1 = nn.Conv1d(10,10,3)
self.conv2 = nn.Conv1d(10,20,3)
self.dropout = nn.Dropout(0.5)
self.dense_l1 = nn.Linear(120,40)
self.dense_l2 = nn.Linear(40,8)
self.dense_l3 = nn.Linear(8,1)
self.soft = nn.Sigmoid()

def forward(self,x):
    out = F.relu(self.conv1(x))
    out = self.dropout(out)
    out = F.relu(self.conv2(out))
    out = self.dropout(out)
    out = out.view(out.size(0),-1)
    out = self.dropout(F.relu(self.dense_l1(out)))
    out = self.dropout(F.relu(self.dense_l2(out)))
    out = self.dropout(F.relu(self.dense_l3(out)))
    return self.soft(out)

batchsize = 25
numepochs = 25
n_batches = int(train_length/batchsize)

train_tensor = train_tensor.float()

def find_correct(yp,yp1):
correct = 0
for i in range(len(yp)):
if(yp[i]==yp1[i]):
correct+=1
return correct

for epoch in range(3):
for i in range(0,n_batches):
Xtrain = train_tensor[i*batchsize:(i+1)*batchsize, : , :]
ytrain = train_labels_tensor[i*batchsize:(i+1)*batchsize]
ytrain = ytrain.view(-1,1).float()
optimizer.zero_grad()
output = model(Xtrain)
loss = F.binary_cross_entropy(output, ytrain)
loss.backward()
optimizer.step()

output = model(train_tensor)
output = (output>=0.5).double()
correct = find_correct(output,train_labels_tensor)
print("Epoch {} : Loss {} : Accuracy {} ".format(epoch,loss.item(),correct/25000))

I am getting loss rates as : 0.67,0.74,0.73

Whereas in Keras I get : 0.60,0.42,0.38

It looks like you are dealing with 25000 grayscale images with a spatial dimension of 10x10.
If that’s correct, you would have to pass your data as [batch_size, channel, height, width] into your model.
Your first conv layer however, uses 10 input channel instead of 1 (for a grayscale image).
So this might be a reason for the difference.
Could you print the shape of a batch and explain your data shapes?

I am not dealing with Images. I am dealing with text.

25000 are the number of sentences. I have converted each sentence into 100 dimensional vector using doc2vec algorithm.

Then I reshaped every vector into 10*10 tensor. Hence (25000,10,10). Therefore 1st CNN layer has a input_channel value 10. Then I consider a batch size of 25.

I see, thanks for the clarification. I didn’t realize you are using nn.Conv1d as this would be an argument against my assumption. :wink:

Could you post your keras code so that we can compare both?

m1 = Sequential()
m1.add(Conv1D(10,3,activation=‘relu’,input_shape=(10,10)))
m1.add(Dropout(0.5))
m1.add(Conv1D(20,3,activation=‘relu’))
m1.add(Dropout(0.5))
m1.add(Flatten())
m1.add(Dense(40,activation=‘relu’))
m1.add(Dropout(0.5))
m1.add(Dense(8,activation=‘relu’))
m1.add(Dropout(0.5))
m1.add(Dense(1,activation=‘sigmoid’))

m1.compile(optimizer=‘adam’,loss=‘binary_crossentropy’,metrics=[‘accuracy’])
m1.fit(train_vecs,train_labels,batch_size=25,epochs=25)

Is there any chance that consecutive batches might be the reason for this??

Your models differ a bit at the end. In your keras model you are using a last dense layer with a single output and a sigmoid non-linearity.
In your PyTorch model you add a relu and dropout of before the sigmoid:

    out = self.dropout(F.relu(self.dense_l3(out)))
    return self.soft(out)

Could you remove the relu and dropout and run it again?

1 Like

Yes, It is working now. Thanks a lot