Can the timesteps T in deep reinforcement learning be trained?

Recently I’m trying to implement a deep reinforcement learning project which require a variable timesteps.I want to train a network to output a parameter T,and use T as the length or timesteps of policy gradient method or DQN method,I wonder if that’s implementable? I mean when we do back-propogate, can me back-propogate through timesteps T?

Discrete values don’t have gradients. However, you can either soften it somehow or train with RL. I personally feel the second approach to be more promising.

Basically I think functions like torch.ceil() can handle the discrete problem. But I don’t know if there is a way to use the trainable parameter T to control the action length of DRL method.

No, stepwise functions are never solution to such problems. ceil will prevent gradient flowing unless the input is exactly an integer.