Network in Q-Learning is predicting the same Q-values for all states?

Jesper.Lindberg · April 7, 2020, 7:10am

Hello!
I have an issue that when I am trying to predict the Q-values by feeding my inputs to the network, no matter what I feed I seem to get the same output from the network? I am assuming that this is because of the very small changes in the input vector?

This post will first show my network structure, then I will continue with the function that feeds the inputs to the network followed by the function can all how I construct the model. At the end I will show the input that was fed and the predicted values from the function.

My network structure:

class Model :public torch::nn::Module {
public:
torch::nn::Linear input_layer{ nullptr }, first_hidden_layer{ nullptr }, output_layer{ nullptr };
  //Constructor
  Model() {
	  //construct and register the layers we want
	  input_layer = register_module("input_layer", torch::nn::Linear(k_input_size_*k_amount_of_states_, 128));
	  first_hidden_layer = register_module("first_hidden_layer", torch::nn::Linear(128, 256));
	  output_layer = register_module("output_layer", torch::nn::Linear(256, k_output_size_));

  }

  ~Model() {
  }

  //Feed forward through the network
  torch::Tensor forward(torch::Tensor tensor_x) {
	  tensor_x = torch::relu(input_layer->forward(tensor_x));
	  tensor_x = torch::relu(first_hidden_layer->forward(tensor_x));
	  tensor_x = output_layer->forward(tensor_x);

	  return (tensor_x);
  }
 };

My prediction function:

int PerceivedActions::getAction(PerceivedActions::Model &model_, const std::vector<double> &input_vector) const {
  std::vector<float> float_vec(input_vector.begin(), input_vector.end());
  torch::Tensor input_tensor = torch::from_blob(float_vec.data(), { k_input_size_*k_amount_of_states_ }/*size*/);
  std::cout << "input_tensor: " << input_tensor << std::endl;
  torch::NoGradGuard no_grad;
  model_.eval();
  auto DNN_out = model_.forward(input_tensor);
  std::cout << DNN_out << std::endl;
  int action = DNN_out.argmax(0).item().toInt();
  return action;
}

Function call and constructing the model:
model_ = std::make_unique<PerceivedActions::Model>();
action = getAction(*model_, last_states_);

The cout from the function:

input_tensor:  3.4798e-02
-7.2078e-05
 1.3180e-02
-9.9746e-01
-2.0106e-04
-2.8676e-03
 6.9767e-06
 4.3532e-04
-3.5193e-06
 1.7581e-04
-9.9486e-01
 8.4208e-03
-5.6824e-06
 4.0093e-04
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4785e-02
-2.6299e-04
 1.3100e-02
-9.9477e-01
-2.0134e-04
-2.8676e-03
 1.0840e-05
 2.1770e-03
-2.9073e-05
 8.1203e-04
-9.9119e-01
 3.0049e-02
-5.5876e-04
 1.4589e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4772e-02
-4.5389e-04
 1.3066e-02
-9.9144e-01
-2.0145e-04
-2.8674e-03
 9.5414e-06
 4.7711e-03
-9.5219e-05
 1.7521e-03
-9.8700e-01
 4.0107e-02
-1.0612e-03
 1.9446e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4759e-02
-6.4478e-04
 1.3047e-02
-9.8802e-01
-2.0139e-04
-2.8673e-03
 8.3816e-06
 7.4450e-03
-1.6590e-04
 2.7179e-03
-9.8552e-01
 4.0108e-02
-1.0608e-03
 1.9390e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
[ CPUFloatType{68} ]
 12.0482
 -0.2093
  7.3530
 -0.5870
 13.6391
[ CPUFloatType{5} ]
input_tensor:  3.4785e-02
-2.6299e-04
 1.3100e-02
-9.9477e-01
-2.0134e-04
-2.8676e-03
 1.0840e-05
 2.1770e-03
-2.9073e-05
 8.1203e-04
-9.9119e-01
 3.0049e-02
-5.5876e-04
 1.4589e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4772e-02
-4.5389e-04
 1.3066e-02
-9.9144e-01
-2.0145e-04
-2.8674e-03
 9.5414e-06
 4.7711e-03
-9.5219e-05
 1.7521e-03
-9.8700e-01
 4.0107e-02
-1.0612e-03
 1.9446e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4759e-02
-6.4478e-04
 1.3047e-02
-9.8802e-01
-2.0139e-04
-2.8673e-03
 8.3816e-06
 7.4450e-03
-1.6590e-04
 2.7179e-03
-9.8552e-01
 4.0108e-02
-1.0608e-03
 1.9390e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4746e-02
-8.3567e-04
 1.3033e-02
-9.8456e-01
-2.0135e-04
-2.8672e-03
 7.0877e-06
 1.0119e-02
-2.3652e-04
 3.6809e-03
-9.8481e-01
 4.0108e-02
-1.0603e-03
 1.9346e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
[ CPUFloatType{68} ]
 12.0482
 -0.2093
  7.3530
 -0.5870
 13.6391
[ CPUFloatType{5} ]
input_tensor:  3.4772e-02
-4.5389e-04
 1.3066e-02
-9.9144e-01
-2.0145e-04
-2.8674e-03
 9.5414e-06
 4.7711e-03
-9.5219e-05
 1.7521e-03
-9.8700e-01
 4.0107e-02
-1.0612e-03
 1.9446e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4759e-02
-6.4478e-04
 1.3047e-02
-9.8802e-01
-2.0139e-04
-2.8673e-03
 8.3816e-06
 7.4450e-03
-1.6590e-04
 2.7179e-03
-9.8552e-01
 4.0108e-02
-1.0608e-03
 1.9390e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4746e-02
-8.3567e-04
 1.3033e-02
-9.8456e-01
-2.0135e-04
-2.8672e-03
 7.0877e-06
 1.0119e-02
-2.3652e-04
 3.6809e-03
-9.8481e-01
 4.0108e-02
-1.0603e-03
 1.9346e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4732e-02
-1.0265e-03
 1.3022e-02
-9.8109e-01
-2.0130e-04
-2.8670e-03
 6.5288e-06
 1.2793e-02
-3.0710e-04
 4.6419e-03
-9.8440e-01
 4.0108e-02
-1.0599e-03
 1.9310e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
[ CPUFloatType{68} ]
 12.0482
 -0.2093
  7.3530
 -0.5870
 13.6391
[ CPUFloatType{5} ]
input_tensor:  3.4759e-02
-6.4478e-04
 1.3047e-02
-9.8802e-01
-2.0139e-04
-2.8673e-03
 8.3816e-06
 7.4450e-03
-1.6590e-04
 2.7179e-03
-9.8552e-01
 4.0108e-02
-1.0608e-03
 1.9390e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4746e-02
-8.3567e-04
 1.3033e-02
-9.8456e-01
-2.0135e-04
-2.8672e-03
 7.0877e-06
 1.0119e-02
-2.3652e-04
 3.6809e-03
-9.8481e-01
 4.0108e-02
-1.0603e-03
 1.9346e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4732e-02
-1.0265e-03
 1.3022e-02
-9.8109e-01
-2.0130e-04
-2.8670e-03
 6.5288e-06
 1.2793e-02
-3.0710e-04
 4.6419e-03
-9.8440e-01
 4.0108e-02
-1.0599e-03
 1.9310e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
 3.4719e-02
-1.2174e-03
 1.3012e-02
-9.7760e-01
-2.0125e-04
-2.8669e-03
 5.7976e-06
 2.3993e-02
-9.9653e-04
 8.8582e-03
-9.7347e-01
 8.2793e-02
-4.1984e-03
 4.1589e-03
 1.0000e+00
-1.0000e+00
 1.0000e+00
[ CPUFloatType{68} ]
 12.0482
 -0.2093
  7.3530
 -0.5870
 13.6391
[ CPUFloatType{5} ]

Jesper.Lindberg · April 7, 2020, 7:37am

Normalizing the input on a scale 0-1 instead of -1 to 1 solved this issue.