How to tailor a lstm-nn to react to exogenous variables?

I’m an undergraduate student doing my research project.
I’m not a native English speaker, so apologies for my weird grammar and vocabulary.

My research topic is wind power prediction using an LSTM-NN and its application in power trading.
As wind power sellers, we care not only about the forecast accuracy but also about the power price. I want to design a flexible predictor which will react to the fluctuation of power price. That is to say, a not-very-accurate forecast is acceptable while the power price is low otherwise a more correct forecast will be expected.

In order to achieve that goal, I have done the following:
1. Weight the loss function. While training, I multiplied the squared error directly by the power price. Therefore, when the price is high the loss function correspondingly becomes stricter. My customized loss function is like this:

def weighted_mse_loss(input, target, weight):
    return (weight * (input - target) ** 2)

where the input, target, and weight denote the predicted value, actual value, and power price respectively.
And this simple and intuitive method gained good performance. The RMSE indeed became lower when the power price was high, compared to the case where the loss function was not weighted.
But I feel rather confused by its surprisingly excellent performance because I didn’t teach the predictor when a stricter loss function would apply and when it would not. In this approach, I predicted the wind power production based on the past observed power production alone, so the predictor shouldn’t have known when a loss function with a larger weight will apply.

So I tried another method.
2. Input time-series data of both wind power and wind power as multiple variables. I made an LSTM-NN-based predictor where the input was sequence data of power as well as power price, and the output was the forecast value of wind power. The input of power price sequence data was solely for teaching the model when the weight of the loss function would be larger, rather than predicting wind power based on power price.
But unfortunately, for unknown reasons, this didn’t work well. It failed in performing a better forecast when the power price was high, even worse than a normal non-weighted model.
Then I tried another method.

3. Customize an LSTM-NN-based model.
It was still a univariate predictor whose input only contained the sequence data of wind power, but I tried to “inform” the predictor of the current power price just before it outputs the eventual forecast value.
The structure of my customized network is like this:

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        self.batch_size = batch_size
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True)
        self.linear1 = nn.Linear(self.hidden_size, self.output_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(self.output_size + 1, self.output_size)

    def forward(self, input_seq, price):
        batch_size, seq_len = input_seq.shape[0], input_seq.shape[1]
        h_0 = torch.randn(self.num_directions * self.num_layers, batch_size, self.hidden_size).to(device)
        c_0 = torch.randn(self.num_directions * self.num_layers, batch_size, self.hidden_size).to(device)
        output, _ = self.lstm(input_seq, (h_0, c_0)) 
        output = self.linear1(output)  # Fully connected layer
        pred = output[:, -1, :] # The output of prediction
        pred_withprice = torch.cat((pred, price), dim=1) # "Inform the predictor of the current power price by attaching the price to it" 
        pred_withprice = self.relu(pred_withprice) # Add some non-linearity
        pred = self.linear2(pred_withprice) # Output the eventual forecast value
        return pred

However, its performance turned out to be terrible, to say the least. The errors didn’t even converge along with training.
I would be grateful if someone could help me out with my issues. I know it is technically a research problem rather than a coding problem. But I literally run out of ideas about how to design an lstm-nn predictor that can react to exogenous variables (power price in my case).
I am really poor at explaining and describing my problem :smiling_face_with_tear: Please forgive me for my rambling.
Thanks in advance!