Runaway gradient?

I am playing around with autoencoders, and my network has this structure:
Encoder:

self.main = nn.Sequential(
        nn.Conv2d(3,500,3,1,padding = 1),
        nn.ReLU(),
        nn.Conv2d(500,450,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(450,400,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(400,350,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(350,300,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(300,250,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(250,200,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(200,150,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(150,125,3,1,padding = 1),
        nn.ReLU(),
        nn.AvgPool2d(2,2,0),
        nn.Conv2d(125,100,3,1,padding = 1),
        nn.ReLU(),
    )

Decoder:

self.main = nn.Sequential(
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(100,125,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(125,150,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(150,200,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(200,250,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(250,300,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(300,350,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(350,400,3,1,padding = 1),
        nn.ReLU(),
        nn.Upsample(scale_factor = 2),
        nn.Conv2d(400,450,3,1,padding = 1),
        nn.ReLU(),
        nn.Conv2d(450,500,3,1,padding = 1),
        nn.ReLU(),
        nn.Conv2d(500,3,3,1,padding = 1)
    )

but i’ve noticed that for some reason after some training time loss skyrockets from 0.0X to hundreds and beyond. What might be the culprit?

Optimizer settings are these:

optimizer = optim.Adam(model.parameters(),lr = 0.002, betas=(0.5,0.999));

Have you tried playing around with the hyperparameters of the Optimizer or perhaps used another Optimizer like SGD?