Could could estimate the memory usage by calculating the number of parameters, forward activations, gradients etc. as described here. You could also use the dispatch mode as described here.