when i train my neural network using Dataparallel of pytorch this error appears
terminate called without an active exception
Aborted (core dumped)
the instruction which cause the error is :
network = DataParallel(network, chunk_sizes= list_chunk_sizes)
list_chunk_sizes = [4, 5,5,5,5,5,5,5,5,5]
batch_size = 49
any one can help me?
Are you using nn.DataParallel
, as the chunk_sizes
argument shouldn’t be defined?
Could you explain, what this argument does (in this custom implementation)?
yes i am using torch.nn.Dataparallel to train network in parallel
rst forgive my language because i dont speek english
i use batch size = 49 then i divide it in 10 sub batchez, each have a size folloing this list [4, 5,5,5,5,5,5,5,5,5]
and when the model train the network, the images are feed to the network based on the list of chunk sizes
its for doing parallelliszme if you have + then 1 GPU card
Thanks for the explanation.
Did you write the DataParallel
class yourself or are you using a specific implementation from another repository?
The native nn.DataParallel
class shouldn’t take chunk_sizes
as its argument, so I’m just wondering which implementation you are using.
aah yes
it is an implementation from CornerNet, algorithm for detection of object in images
the problem in this error is that sometimes it dont appear
and where should i put chunk sizes, if it is not the correct place
I’m not familiar with this implementation, so could you post a link to it, please?
Also, could you use nn.DataParallel
and check, if you are running into the same error?
the link to cornernet is:
princeton-vl/CornerNet
Contribute to princeton-vl/CornerNet development by creating an account on GitHub.
you can find this implementation in CornerNet-master/nnet/py_factory.py
There is part of this code:
class NetworkFactory(object):
def init(self, db):
super(NetworkFactory, self).__init__()
module_file = "models.{}".format(system_configs.snapshot_name)
nnet_module = importlib.import_module(module_file)
self.model = DummyModule(nnet_module.model(db))
self.loss = nnet_module.loss
self.network = Network(self.model, self.loss)
self.network = DataParallel(self.network,chunk_sizes=system_configs.chunk_sizes )
total_params = 0
for params in self.model.parameters():
num_params = 1
for x in params.size():
num_params *= x
total_params += num_params
print("total parameters: {}".format(total_params))
if system_configs.opt_algo == "adam":
self.optimizer = torch.optim.Adam(
filter(lambda p: p.requires_grad, self.model.parameters())