Is there any metric provided to assess robustness / stability of a ML module? As far as I understand the ones provided are more related with the functionality.
I understand reliability as the ability to perform the intended function in the presence of abnormal or unknown inputs, related to ‘reliability’.

You could check TorchDrift created by @tom which might be useful for your use case.

