Same program runs slower in 0.4 than in 0.3

My program runs much slower (~50%) in 0.4.0 than in 0.3.0 with the same set of parameters.
What could be the most important reasons?

Is there some general tips that I should follow, when I adapt from 0.3.0 to 0.4.0, to make my program equally fast?

Thanks very much in advance!!!

Could you post a small code sample reproducing the issue?
In this way we could debug it and have a look at the operations.

Sorry for being slow.

More details can be found here:

Impressed by how responsive and quick the PyTorch team is.