Why self.parameters() become empty when running models?

My bert model raise a StopIteration error when running the code next(self.parameters()).dtype
I first check the model simply by:

print(list(self.parameters())) 
# []
# []
# [][]
# []

It mean parameters are empty.

My code is like:

class MyModel(nn.Module)
    def __init__(self, hyper_param):
        super(MyModel, self).__init__()
        self.bert = Model(hyper_param)
    def forward(batch):
        # check attributes -- (2)
        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        cx = batch["cx"]
        cy = batch["cy"]
        height = batch["height"]
        # next(self.parameters()).dtype is in self.bert
        output = self.bert(input_ids, attention_mask, cx, cy, height)
        return output
model = MyBertModel()
batch = next(dataset_iter)
# check attributes -- (1)
loss = model(batch)

I further check all the protected attributes of the model at position (1) and (2) commented in above code, and get following result:

At position (1)

_backward_hooks = {OrderedDict: 0} OrderedDict()
_buffers = {OrderedDict: 0} OrderedDict()
_forward_hooks = {OrderedDict: 0} OrderedDict()
_forward_pre_hooks = {OrderedDict: 0} OrderedDict()
_is_full_backward_hook = {NoneType} None
_load_state_dict_pre_hooks = {OrderedDict: 0} OrderedDict()
_modules = {OrderedDict: 0} OrderedDict()
_non_persistent_buffers_set = {set: 0} set()
_parameters = {OrderedDict: 1} OrderedDict([('weight', Parameter containing:\ntensor([[-0.0394,  0.0065,  0.0195,  ..., -0.0082, -0.0145,  0.0046],\n        [ 0.0124, 	 0.0060,  0.0094,  ..., -0.0219,  0.0053, -0.0220],\n        [ 0.0086, -0.0022,  0.0252,  ..., -0.0312, -0.0307,  0.0214],\n        ...,\n        [ 0.0200,  0.0115,  	0.0103,  ..., -0.0153,  0.0163, -0.0371],\n        [ 0.0301, -0.0143,  0.0047,  ..., -0.0138, -0.0130,  0.0120],\n        [-0.0115,  0.0102, -0.0111,  ..., -0.0081, -	0.0122, -0.0312]],\n       device='cuda:0', requires_grad=True))])

	 'weight' = {Parameter: (1000, 768)} Parameter containing:\ntensor([[-0.0394,  0.0065,  0.0195,  ..., -0.0082, -0.0145,  0.0046],\n        [ 0.0124,  0.0060,  	0.0094,  ..., -0.0219,  0.0053, -0.0220],\n        [ 0.0086, -0.0022,  0.0252,  ..., -0.0312, -0.0307,  0.0214],\n        ...,\n        [ 0.0200,  0.0115,  0.0103,  ..., -		0.0153,  0.0163, -0.0371],\n        [ 0.0301, -0.0143,  0.0047,  ..., -0.0138, -0.0130,  0.0120],\n        [-0.0115,  0.0102, -0.0111,  ..., -0.0081, -0.0122, -		0.0312]],\n       device='cuda:0', requires_grad=True)

	 __len__ = {int} 1
_state_dict_hooks = {OrderedDict: 0} OrderedDict()
_version = {int} 1

At positon (1), parameters seem normal

And At position (2):

_backward_hooks = {OrderedDict: 0} OrderedDict()
_buffers = {OrderedDict: 0} OrderedDict()
_former_parameters = {OrderedDict: 1} OrderedDict([('weight', tensor([[-0.0394,  0.0065,  0.0195,  ..., -0.0082, -0.0145,  0.0046],\n        [ 0.0124,  0.0060,  	0.0094,  ..., -0.0219,  0.0053, -0.0220],\n        [ 0.0086, -0.0022,  0.0252,  ..., -0.0312, -0.0307,  0.0214],\n        ...,\n        [

	 'weight' = {Tensor: (1000, 768)} tensor([[-0.0394,  0.0065,  0.0195,  ..., -0.0082, -0.0145,  0.0046],\n        [ 0.0124,  0.0060,  0.0094,  ..., -0.0219,  		0.0053, -0.0220],\n        [ 0.0086, -0.0022,  0.0252,  ..., -0.0312, -0.0307,  0.0214],\n        ...,\n        [ 0.0200,  0.0115,  0.0103,  ..., -0.0153,  0.0163, -		0.0371],\n        [ 0.0301, -0.0143,  0.0047,  ..., -0.0138, -0.0130,  0.0120],\n        [-0.0115,  0.0102, -0.0111,  ..., -0.0081, -0.0122, -0.0312]],\n       		device='cuda:7', grad_fn=<BroadcastBackward>)

	 __len__ = {int} 1
_forward_hooks = {OrderedDict: 0} OrderedDict()
_forward_pre_hooks = {OrderedDict: 0} OrderedDict()
_is_replica = {bool} True
_is_full_backward_hook = {NoneType} None
_load_state_dict_pre_hooks = {OrderedDict: 0} OrderedDict()
_modules = {OrderedDict: 0} OrderedDict()
_non_persistent_buffers_set = {set: 0} set()
_parameters = {OrderedDict: 0} OrderedDict()
	__len__ = {int} 0
_state_dict_hooks = {OrderedDict: 0} OrderedDict()
_version = {int} 1

What I can find is that the _parameter simply disappears after the call of forward() method, and with an additional _former_parameters attribute appear, but I can’t find this attribute in nn.Modules source code. How to explain this?

Are you using nn.DataParallel? If so, I think the replica models move the parameters to _former_parameters to avoid conflicts with leaf nodes.
However, the recommendation is to use DistributedDataParallel for the better performance and I haven’t taken a look at nn.DataParallel in a while.