This is difficult to do unless you do a hack like in Caching parameters, and randomly using one of them for computing gradients .
Still, what you did is wrong, because it only reassigns the local variable s1
rather than the tensor contents.
This is difficult to do unless you do a hack like in Caching parameters, and randomly using one of them for computing gradients .
Still, what you did is wrong, because it only reassigns the local variable s1
rather than the tensor contents.