Hi everyone. I want to implement a gradient-based Meta-Learning algorithm in PyTorch and I found out that there is a library called * higher* based on PyTorch that can be used to implement such algorithms where you have different steps of gradient descent in the inner loop of the algorithm. Therefore I decided to go through the paper published for the library here:

https://anucvml.github.io/ddn-cvprw2020/papers/Grefenstette_et_al_cvprw2020.pdf

However, there is a couple of things in the paper that I donâ€™t understand. For example, in the **Obstacles** section it mentions that models in PyTorch and Keras are stateful, where the model encapsulates the parameters. Can someone explain this intuitively by an example that what is the difference between stateful and stateless models, and how can you implement them in PyTorch?

Thanks