Bug in Data Parallel?

thanks, the data parallel issue is solved it seems and the code is working. however, it is very slow (slower than a single gpu ) and i am getting the error here: How to flatten parameters?

any hints on how to fix the issue, i assume flatten lstm must be called after each call but then even if i do it, i still get the linked warning and the code is very slow