The question is maybe strange. I always used clip_grad_norm for recurrent unit in order to prevent gradient explosion.
It is possible that training a CNN within a clip gradient (0.5) can help a better convergence or retrospecttly is a limitation?
Best,
Nico