What is the best practice of MLP in year 2019(or 2020)

I want to use available several features to predict a variable. It is not related to vision or NLP. I believe there are good reasons that the variable to be predicted is a non linear function of these features.

currently what I did is like following:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(53, 200)
        self.fc2 = nn.Linear(200, 100)
        self.fc3 = nn.Linear(100, 36)
        self.fc4 = nn.Linear(36, 1)

    def forward(self, x):
        x = F.leaky_relu(self.fc1(x))
        x = F.leaky_relu(self.fc2(x))
        x = F.leaky_relu(self.fc3(x))
        x = self.fc4(x)
        return x
net = Net().to(device)
loss_function = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay= 1e-6)#, momentum = 0.9, nesterov = True)


def train_normal(model, device, train_loader, optimizer, epoch):
    model.train ()
    for batch_idx, (data, target) in enumerate (train_loader):
        data = data.to (device)
        target = target.to (device)
        optimizer.zero_grad ()
        output = model (data)
        loss = loss_function (output, target)
        loss.backward ()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 100)
        optimizer.step ()
        if batch_idx % 100 == 0:
            print ('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format (
                epoch, batch_idx * len (data), len (train_loader.dataset),
                       100. * batch_idx / len (train_loader), loss.item ()))

It is a typical MLP. After trained 100 epochs the result shows the model did learn something but still not very accurate:

Train Epoch: 99 [268800/276316 (97%)] Loss: 0.217219
Train Epoch: 99 [275200/276316 (100%)] Loss: 0.234965
predicted actual diff
-1.18 -1.11 -0.08
0.15 -0.15 0.31
0.19 0.27 -0.08
-0.49 -0.48 -0.01
-0.05 0.08 -0.14
0.44 0.50 -0.06
-0.17 -0.05 -0.12
1.81 1.92 -0.12
1.55 0.76 0.79
-0.05 -0.30 0.26

How shall I improve the accuracy? To add more hidden layer or increase the neuron numbers? What is the best practice nowadays?