Why and How to flatten lstm parameters?

this is linked to the post here: Bug in Data Parallel?