to answer your three questions:
-
We chose to make the examples to be best practices. We dont suggest users to use sequential except for basic convenience. Sequential becomes inflexible very quickly.
-
You can use this recently added function http://pytorch.org/docs/nn.html#torch.nn.Module.named_parameters to filter out just the ELU parameters and not send them to the optimizer.
-
you can use the environment variable CUDA_VISIBLE_DEVICES=“device_id” to control which GPU to use. For example
CUDA_VISIBLE_DEVICES=2 python main.py # uses GPU-3