Value, reward and done states

rsarpongstreetor · April 24, 2024, 5:14am

0%| | 0/20000 [1:09:45<?, ?it/s]

RuntimeError Traceback (most recent call last)
in <cell line: 7>()
11 # We re-compute it at each epoch as its value depends on the value
12 # network which is updated in the inner loop.
—> 13 advantage_module(tensordict_data)
14 data_view = tensordict_data.reshape(-1)
15 replay_buffer.extend(data_view.cpu())

7 frames
/usr/local/lib/python3.10/dist-packages/torchrl/objectives/value/functional.py in vec_generalized_advantage_estimate(gamma, lmbda, state_value, next_state_value, reward, done, terminated, time_dim)
299 == terminated.shape
300 ):
→ 301 raise RuntimeError(SHAPE_ERR)
302 dtype = state_value.dtype
303 *batch_size, time_steps, lastdim = terminated.shape

RuntimeError: All input tensors (value, reward and done states) must share a unique shape.
time: 2.52 s (started: 2024-04-24 04:20:38 +00:00)

Please how can I edit the tensordict to make the value, reward and done states to have equal shapes?

rsarpongstreetor · April 27, 2024, 12:11pm

so I found the shapes for all
(value and done states) in tensordict_data but
reward in a subdirectory of tensordict ie tensordict_data[‘next’, ‘reward’]
please how can the advantage_module read all of them?

It seems
it is not reading all

rsarpongstreetor · April 27, 2024, 3:59pm

solved , had to flatten tensors in the neural network