Here is a simple numerical regression example with random data.
Input: (10000,300) output: (10000,3) They have a simple quadratic relationship. It’s not because of data distribution. I had this problem in a real dataset of mine. I used a 3-layer fully-connected with batch normalization.
I try to use the same parameters for keras and pytorch on CPU, since it’s relatively small dataset. It turns out keras is almost 3x as fast.
colab script: https://colab.research.google.com/drive/1BQTCbIUOv-afuRbSn2chA4ae1bfD0bDk
def build_keras_model(optimizer='adam'):
input = Input(shape=(300,))
x = Dense(100, activation='tanh')(input)
x = BatchNormalization()(x)
x = Dense(50, activation='tanh')(x)
x = BatchNormalization()(x)
output = Dense(3, activation='linear')(x)
model = Model(inputs=input, outputs=output)
model.compile(optimizer=optimizer, loss='mse', metrics=['mae'])
return model
keras_model = build_keras_model()
keras_model.fit(X, Y, batch_size=512, epochs=200)
def build_pytorch_model():
model = nn.Sequential(
nn.Linear(300,100),
nn.Tanh(),
nn.BatchNorm1d(100, momentum=0.01),
nn.Linear(100,50),
nn.Tanh(),
nn.BatchNorm1d(50, momentum=0.01),
nn.Linear(50,3)
)
return model
loss_func = nn.MSELoss()
torch_model = build_pytorch_model()
optimizer = torch.optim.Adam(torch_model.parameters(), lr=lr)
fit_torch(torch_model, optimizer, loss_func, epochs, dl)
The script and result are shared through Colab. Thanks for anyone pointing out the reason or where I did wrong.