Could you explain your use case and in particular why you are using retain_graph=True
as this ususally yields these kind of errors (and is often used as a workaround for another error)?
Hi ptrblck, many thanks for your help here. I have solved this bug here.
else: optimizer.zero_grad() loss.backward(retain_graph = True) optimizer.step() train_batch.grad.zero_() loss.backward() grads = train_batch.grad
Hi guys . I met the problem with loss.backward() as you can see here
File “train.py”, line 360, in train
loss_adv.backward(retain_graph=True)
File “/usr/local/lib/python3.7/dist-packages/torch/_tensor.py”, line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py”, line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 7]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
My code is
I use pytorch 1.12.1 in google colab
Can anyone help me to solve this problem .Thank you very much
@ptrblck @albanD can you help me
Could you also check why retain_graph
is used in your code?
When I don’t use retain_graph=True I meet this problem
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
In that case try to fix this issue as it seems your computation graph is growing in each iteration such that the backward
pass would try to compute the gradient for multiple iterations.
This could happen e.g. if the input to your model depends somehow on the output from the previous iteration.
Try moving all the optimizer steps to the very end after all the backwards have completed
See these two similar issues:
- MobileFSGAN - One of the variables needed for gradient computation has been modified by an inplace operation
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2; expected version 1 instead - #14 by soulitzer
Now , it work well .Thank you for your suggestion
Now , it work well when I move all the optimizer step after all backwards .Thanks for yours suggestion
Hi raharth,
In case you or anyone else are still struggling in this problem, I would like to post the solution I just figure out to disable version checking intentionally by adding saved_tensors_hooks. This could work because according to the source of the version checking, the version checking is implemented as an unpack hook and would be skipped if there are any other hooks defined.
A minimal demo would be
import torch
from torch.autograd import Variable
a = Variable(torch.randn([3,4]), requires_grad=True)
b = torch.randn([3, 1])
def pack_hook(x):
print("Packing")
return x
def unpack_hook(x):
print("Unpacking")
return x
# if True:
with torch.autograd.graph.saved_tensors_hooks(pack_hook, unpack_hook):
c = a * b
d = c.sum()
b[0][0] = 1.
d.backward()
print(a.grad)
In my case, I have one thread that is using the model being trained by another thread. I want to avoid copying the model between threads since it’s time consuming for my task and only one thread needs to be accurate.
I’m facing the same problem. I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############
all_agents = []
all_agents.append (Agent (actor_dims, critic_dims))
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()
critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()
target = rewards[:, agent_idx] + agent.gamma * critic_value_
critic_loss= F.mse_loss (critic_value.float (), target.float ())
agent.critic.optimizer.zero_grad ()
critic_loss.backward (retain_graph=True)
actor_loss = agent.critic.forward (states[i], mu_cluster[i]).flatten ()
actor_loss = -(T.mean (actor_loss))
agent.actor.optimizer.zero_grad ()
actor_loss.backward ()
agent.critic.optimizer.step ()
agent.actor.optimizer.step ()```
#################################
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I’m facing the same problem. could you help me please?
I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############
all_agents = []
all_agents.append (Agent (actor_dims, critic_dims))
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()
critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()
target = rewards[:, agent_idx] + agent.gamma * critic_value_
critic_loss= F.mse_loss (critic_value.float (), target.float ())
agent.critic.optimizer.zero_grad ()
critic_loss.backward (retain_graph=True)
actor_loss = agent.critic.forward (states[i], mu_cluster[i]).flatten ()
actor_loss = -(T.mean (actor_loss))
agent.actor.optimizer.zero_grad ()
actor_loss.backward ()
agent.critic.optimizer.step ()
agent.actor.optimizer.step ()```
#################################
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Could you explain why retain_graph=True
is used?
when I remove retain_graph=True
it gives another error :
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
I modified the code as follows, and it working. but I’m not sure if this way is correct or not
all_agents = []
all_agents.append (Agent (actor_dims, critic_dims))
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()
critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()
target = rewards[:, agent_idx] + agent.gamma * critic_value_
agent.critic_loss= F.mse_loss (critic_value.float (), target.float ())
agent.critic_loss.backward (retain_graph=True)
for agent_idx, agent in enumerate (all_agents):
agent.critic.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
agent.critic.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
agent.actor_loss = agent.critic.forward (states[i], mu_cluster[i], typ).flatten ()
agent.actor_loss = -T.mean (agent.actor_loss)
agent.actor_loss.backward (retain_graph=True)
for agent_idx, agent in enumerate (all_agents):
agent.actor.optimizer.step ()
# agent.actor.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
agent.actor.optimizer.zero_grad ()
I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb
I am facing the same probelm. one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1024]] is at version 12; expected version 11 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
for epoch_idx in range(n_epochs):
len_dataloader = min(len(Source_loader), len(train_loader))
dl_source_iter= iter(Source_loader)
dl_target_iter= iter(train_loader)
model_G.train()
model_c.train()
train_loss1=0.0
train_loss2=0.0
train_loss3=0.0
Source_accuracy=0.0
Domain_accuracy=0.0
for batch_idx in range(len_dataloader):
img_s,y_s=next(dl_source_iter)
img_t,y_t=next(dl_target_iter)
batch_size=len(y_s)
batch_size1=len(y_t)
feat_h=feat_l[0:batch_size,:,:,:]
feat_h_kl=feat_h.reshape(-1,1024)
y_s=y_s.unsqueeze(1)
y_t=y_t.unsqueeze(1)
#a=np.random.normal(0,1,(len(y_s)*14,256)).astype("float32")
#zn=Variable(torch.tensor(np.random.normal(0,1,(len(y_s)*14,1024)).astype("float32")))
img_s = Variable(img_s)
img_t = Variable(img_t)
#UDA by Backpropagation
opt_g.zero_grad()
opt_c.zero_grad()
feat_s=model_G(img_s)
output_s=model_c.classifier(feat_s)
output_loss=loss_fn_class(output_s,y_s)
feat_s_kl=feat_s.view(-1,1024)
loss_kld_s=F.kl_div(F.log_softmax(feat_s_kl),F.softmax(feat_h_kl))
loss1=output_loss+loss_kld_s
loss1.backward(retain_graph=True)
opt_g.step()
opt_c.step()
opt_g.zero_grad()
opt_c.zero_grad()
# #loss_s=loss_fn_class(output_s,y_s)
# X_t_train, X_t_domain,y_t_train, y_t_domain1 = train_test_split(img_t,y_t ,
# random_state=104,
# test_size=0.70,
# shuffle=True)
# feat_t_output=model_G(X_t_train)
feat_t_output=model_G(img_t)
output_t=model_c.classifier(feat_t_output)
loss_output=loss_fn_class(output_t,y_t)
feat_t_kl=feat_t_output.view(-1,1024)
loss_kld_t=F.kl_div(F.log_softmax(feat_t_kl),F.softmax(feat_h_kl))
#feat_zn_recon=model_G.decode(feat_h)
# feat_t_recon=model_G(img_t,is_deconv=True)
# feat_h1=model_G(zn)
# loss_dal=criterionDAL(feat_t_recon,feat_zn_recon)
loss2=loss_output+loss_kld_t
loss2.backward()
opt_g.step()
opt_c.step()
opt_g.zero_grad()
opt_c.zero_grad()
@ptrblck and @albanD can you please help me with 2 step loss calculation here?
As already asked in this thread: could you explain why retain_graph=True
is used in your code as it is often applied as a workaround for another error which causes the invalid inplace operation.
I have a similar issue and haven’t been able to debug. I need retrain_graph=True because I am training a MobileVOD model. It Combines a mobilenet basenet with a bottleneck lstm and basically introduces a temporal element to object detection. I’ll paste my code below and any help is much appreciated.
Training…
"""Script for training the MobileVOD with 1 Bottleneck Bottleneck LSTM layers. As in mobilenet, here we use depthwise seperable convolutions
for reducing the computation without affecting accuracy much. Model is trained on Imagenet VID 2015 dataset.
Here we unroll LSTM for 10 steps and gives 10 consecutive frames of video as input.
Few global variables defined here are explained:
Global Variables
----------------
args : dict
Has all the options for changing various variables of the model as well as hyper-parameters for training.
dataset : VIDDataset (torch.utils.data.Dataset, For more info see datasets/vid_dataset.py)
optimizer : optim.RMSprop
scheduler : CosineAnnealingLR, MultiStepLR (torch.optim.lr_scheduler)
config : mobilenetv1_ssd_config (See config/mobilenetv1_ssd_config.py for more info, where you can change input size and ssd priors)
loss : MultiboxLoss (See network/multibox_loss.py for more info)
"""
import argparse
import os
import logging
import sys
import itertools
import torch
from torch.utils.data import DataLoader, ConcatDataset
from torch.optim.lr_scheduler import CosineAnnealingLR, MultiStepLR
from torch.utils.tensorboard import SummaryWriter
from utils.misc import str2bool, Timer, store_labels
from network.mvod_bottleneck_lstm1 import MobileVOD, SSD, MobileNetV1, MatchPrior
from datasets.vid_dataset_new import VIDDataset
from network.multibox_loss import MultiboxLoss
from config import mobilenetv1_ssd_config
from dataloaders.data_preprocessing import TrainAugmentation, TestTransform
parser = argparse.ArgumentParser(
description='Mobile Video Object Detection (Bottleneck LSTM) Training With Pytorch')
parser.add_argument('--datasets', help='Dataset directory path')
parser.add_argument('--cache_path', help='Cache directory path')
parser.add_argument('--freeze_net', action='store_true',
help="Freeze all the layers except the prediction head.")
parser.add_argument('--width_mult', default=1.0, type=float,
help='Width Multiplifier')
# Params for SGD
parser.add_argument('--lr', '--learning-rate', default=0.0003, type=float,
help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float,
help='Momentum value for optim')
parser.add_argument('--weight_decay', default=5e-4, type=float,
help='Weight decay for SGD')
parser.add_argument('--gamma', default=0.1, type=float,
help='Gamma update for SGD')
parser.add_argument('--base_net_lr', default=None, type=float,
help='initial learning rate for base net.')
parser.add_argument('--ssd_lr', default=None, type=float,
help='initial learning rate for the layers not in base net and prediction heads.')
# Params for loading pretrained basenet or checkpoints.
parser.add_argument('--pretrained', help='Pre-trained model')
parser.add_argument('--resume', default=None, type=str,
help='Checkpoint state_dict file to resume training from')
# Scheduler
parser.add_argument('--scheduler', default="multi-step", type=str,
help="Scheduler for SGD. It can one of multi-step and cosine")
# Params for Multi-step Scheduler
parser.add_argument('--milestones', default="80,100", type=str,
help="milestones for MultiStepLR")
# Params for Cosine Annealing
parser.add_argument('--t_max', default=120, type=float,
help='T_max value for Cosine Annealing Scheduler.')
# Train params
parser.add_argument('--batch_size', default=1, type=int,
help='Batch size for training')
parser.add_argument('--num_epochs', default=200, type=int,
help='the number epochs')
# this was originally 4, set to 0 - https://stackoverflow.com/questions/64772335/pytorch-w-parallelnative-cpp206
parser.add_argument('--num_workers', default=0, type=int,
help='Number of workers used in dataloading')
parser.add_argument('--validation_epochs', default=5, type=int,
help='the number epochs')
parser.add_argument('--debug_steps', default=100, type=int,
help='Set the debug log output frequency.')
parser.add_argument('--sequence_length', default=10, type=int,
help='sequence_length of video to unfold')
parser.add_argument('--use_cuda', default=True, type=str2bool,
help='Use CUDA to train model')
parser.add_argument('--checkpoint_folder', default='models/',
help='Directory for saving checkpoint models')
logging.basicConfig(stream=sys.stdout, level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
args = parser.parse_args()
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() and args.use_cuda else "cpu")
print('DEVICE',DEVICE)
# tensorboard
writer = SummaryWriter()
# RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 256, 1, 1]] is at version 5; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
torch.autograd.set_detect_anomaly(True)
if args.use_cuda and torch.cuda.is_available():
torch.backends.cudnn.benchmark = True
logging.info("Use Cuda.")
def train(loader, net, criterion, optimizer, device, debug_steps=100, epoch=-1, sequence_length=10):
""" Train model
Arguments:
net : object of MobileVOD class
loader : validation data loader object
criterion : Loss function to use
device : device on which computation is done
optimizer : optimizer to optimize model
debug_steps : number of steps after which model needs to debug
sequence_length : unroll length of model
epoch : current epoch number
"""
net.train(True)
running_loss = 0.0
running_regression_loss = 0.0
running_classification_loss = 0.0
for i, data in enumerate(loader):
images, boxes, labels = data
for image, box, label in zip(images, boxes, labels):
image = image.to(device)
box = box.to(device)
label = label.to(device)
optimizer.zero_grad()
confidence, locations = net(image)
regression_loss, classification_loss = criterion(confidence, locations, label, box) # TODO CHANGE BOXES
loss = regression_loss + classification_loss
loss.backward(retain_graph=True)
optimizer.step()
running_loss += loss.item()
running_regression_loss += regression_loss.item()
running_classification_loss += classification_loss.item()
net.detach_hidden()
if i and i % debug_steps == 0:
avg_loss = running_loss / (debug_steps*sequence_length)
avg_reg_loss = running_regression_loss / (debug_steps*sequence_length)
avg_clf_loss = running_classification_loss / (debug_steps*sequence_length)
logging.info(
f"Epoch: {epoch}, Step: {i}, " +
f"Average Loss: {avg_loss:.4f}, " +
f"Average Regression Loss {avg_reg_loss:.4f}, " +
f"Average Classification Loss: {avg_clf_loss:.4f}"
)
running_loss = 0.0
running_regression_loss = 0.0
running_classification_loss = 0.0
net.detach_hidden()
def val(loader, net, criterion, device):
""" Validate model
Arguments:
net : object of MobileVOD class
loader : validation data loader object
criterion : Loss function to use
device : device on which computation is done
Returns:
loss, regression loss, classification loss
"""
net.eval()
running_loss = 0.0
running_regression_loss = 0.0
running_classification_loss = 0.0
num = 0
for _, data in enumerate(loader):
images, boxes, labels = data
for image, box, label in zip (images, boxes, labels):
image = image.to(device)
box = box.to(device)
label = label.to(device)
num += 1
with torch.no_grad():
confidence, locations = net(image)
regression_loss, classification_loss = criterion(confidence, locations, label, box)
loss = regression_loss + classification_loss
running_loss += loss.item()
running_regression_loss += regression_loss.item()
running_classification_loss += classification_loss.item()
net.detach_hidden()
return running_loss / num, running_regression_loss / num, running_classification_loss / num
def initialize_model(net):
""" Loads learned weights from pretrained checkpoint model
Arguments:
net : object of MobileVOD
"""
if args.pretrained:
logging.info("Loading weights from pretrained netwok")
pretrained_net_dict = torch.load(args.pretrained)
model_dict = net.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_net_dict.items() if k in model_dict and model_dict[k].shape == pretrained_net_dict[k].shape}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict)
net.load_state_dict(model_dict)
if __name__ == '__main__':
timer = Timer()
logging.info(args)
config = mobilenetv1_ssd_config #config file for priors etc.
train_transform = TrainAugmentation(config.image_size, config.image_mean, config.image_std)
target_transform = MatchPrior(config.priors, config.center_variance,
config.size_variance, 0.5)
test_transform = TestTransform(config.image_size, config.image_mean, config.image_std)
logging.info("Prepare training datasets.")
train_dataset = VIDDataset(args.datasets, args.cache_path, transform=train_transform,
target_transform=target_transform, batch_size=args.batch_size)
label_file = os.path.join("models/", "vid-model-labels.txt")
store_labels(label_file, train_dataset._classes_names)
num_classes = len(train_dataset._classes_names)
logging.info(f"Stored labels into file {label_file}.")
logging.info("Train dataset size: {}".format(len(train_dataset)))
train_loader = DataLoader(train_dataset, args.batch_size,
num_workers=args.num_workers,
shuffle=True)
# logging.info("Prepare Validation datasets.")
# val_dataset = VIDDataset(args.datasets, args.cache_path, transform=test_transform,
# target_transform=target_transform, is_val=True)
# logging.info(val_dataset)
# logging.info("validation dataset size: {}".format(len(val_dataset)))
# val_loader = DataLoader(val_dataset, args.batch_size,
# num_workers=args.num_workers,
# shuffle=False)
logging.info("Build network.")
pred_enc = MobileNetV1(num_classes=num_classes, alpha = args.width_mult)
pred_dec = SSD(num_classes=num_classes, batch_size = args.batch_size, alpha = args.width_mult, is_test=False)
if args.resume is None:
net = MobileVOD(pred_enc, pred_dec)
initialize_model(net)
else:
net = MobileVOD(pred_enc, pred_dec)
print("Updating weights from resume model")
net.load_state_dict(
torch.load(args.resume,
map_location=lambda storage, loc: storage))
min_loss = -10000.0
last_epoch = -1
base_net_lr = args.base_net_lr if args.base_net_lr is not None else args.lr
ssd_lr = args.ssd_lr if args.ssd_lr is not None else args.lr
if args.freeze_net:
logging.info("Freeze net.")
for param in pred_enc.parameters():
param.requires_grad = False
net.pred_decoder.conv13.requires_grad = False
net.to(DEVICE)
criterion = MultiboxLoss(config.priors, iou_threshold=0.5, neg_pos_ratio=10,
center_variance=0.1, size_variance=0.2, device=DEVICE)
optimizer = torch.optim.RMSprop([{'params': [param for name, param in net.pred_encoder.named_parameters()], 'lr': base_net_lr},
{'params': [param for name, param in net.pred_decoder.named_parameters()], 'lr': ssd_lr},], lr=args.lr,
weight_decay=args.weight_decay, momentum=args.momentum)
logging.info(f"Learning rate: {args.lr}, Base net learning rate: {base_net_lr}, "
+ f"Extra Layers learning rate: {ssd_lr}.")
# if args.scheduler == 'multi-step':
# logging.info("Uses MultiStepLR scheduler.")
# milestones = [int(v.strip()) for v in args.milestones.split(",")]
# scheduler = MultiStepLR(optimizer, milestones=milestones,
# gamma=0.1, last_epoch=last_epoch)
# elif args.scheduler == 'cosine':
# logging.info("Uses CosineAnnealingLR scheduler.")
# scheduler = CosineAnnealingLR(optimizer, args.t_max, last_epoch=last_epoch)
# else:
# logging.fatal(f"Unsupported Scheduler: {args.scheduler}.")
# parser.print_help(sys.stderr)
# sys.exit(1)
print('net', net)
output_path = os.path.join(args.checkpoint_folder, f"lstm1")
if not os.path.exists(output_path):
os.makedirs(os.path.join(output_path))
logging.info(f"Start training from epoch {last_epoch + 1}.")
for epoch in range(last_epoch + 1, args.num_epochs):
#scheduler.step()
train(train_loader, net, criterion, optimizer,
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch, sequence_length=args.sequence_length)
if epoch % args.validation_epochs == 0 or epoch == args.num_epochs - 1:
val_loss, val_regression_loss, val_classification_loss = val(val_loader, net, criterion, DEVICE)
logging.info(
f"Epoch: {epoch}, " +
f"Validation Loss: {val_loss:.4f}, " +
f"Validation Regression Loss {val_regression_loss:.4f}, " +
f"Validation Classification Loss: {val_classification_loss:.4f}"
)
model_path = os.path.join(output_path, f"WM-{args.width_mult}-Epoch-{epoch}.pth")
torch.save(net.state_dict(), model_path)
logging.info(f"Saved model {model_path}")
# log to tensorboard
writer.add_scalar("val_loss/train", val_loss, epoch)
writer.add_scalar("val_regression_loss/train", val_regression_loss, epoch)
writer.add_scalar("val_classification_loss/train", val_classification_loss, epoch)
writer.add_scalar("Learning rate", args.lr, epoch)
writer.add_scalar("Base net learning rate", base_net_lr, epoch)
writer.add_scalar("Extra Layers learning rate", ssd_lr, epoch)
Network…
#!/usr/bin/python3
"""Script for creating basenet with one Bottleneck LSTM layer after conv 13.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from typing import List, Tuple
from utils import box_utils
from collections import namedtuple
from collections import OrderedDict
from torch.autograd import Variable
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import numpy as np
import logging
def SeperableConv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0):
"""Replace Conv2d with a depthwise Conv2d and Pointwise Conv2d.
Arguments:
in_channels : number of channels of input
out_channels : number of channels of output
kernel_size : kernel size for depthwise convolution
stride : stride for depthwise convolution
padding : padding for depthwise convolution
Returns:
object of class torch.nn.Sequential
"""
return nn.Sequential(
nn.Conv2d(in_channels=int(in_channels), out_channels=int(in_channels), kernel_size=kernel_size,
groups=int(in_channels), stride=stride, padding=padding),
nn.ReLU6(),
nn.Conv2d(in_channels=int(in_channels), out_channels=int(out_channels), kernel_size=1),
)
def conv_bn(inp, oup, stride):
"""3x3 conv with batchnorm and relu
Arguments:
inp : number of channels of input
oup : number of channels of output
stride : stride for depthwise convolution
Returns:
object of class torch.nn.Sequential
"""
return nn.Sequential(
nn.Conv2d(int(inp), int(oup), 3, stride, 1, bias=False),
nn.BatchNorm2d(int(oup)),
nn.ReLU6(inplace=True)
)
def conv_dw(inp, oup, stride):
"""Replace Conv2d with a depthwise Conv2d and Pointwise Conv2d having batchnorm and relu layers in between.
Here kernel size is fixed at 3.
Arguments:
inp : number of channels of input
oup : number of channels of output
stride : stride for depthwise convolution
Returns:
object of class torch.nn.Sequential
"""
return nn.Sequential(
nn.Conv2d(int(inp), int(inp), 3, stride, 1, groups=int(inp), bias=False),
nn.BatchNorm2d(int(inp)),
nn.ReLU6(inplace=True),
nn.Conv2d(int(inp), int(oup), 1, 1, 0, bias=False),
nn.BatchNorm2d(int(oup)),
nn.ReLU6(inplace=True),
)
class MatchPrior(object):
"""Matches priors based on the SSD prior config
Arguments:
center_form_priors : priors generated based on specs and image size in config file
center_variance : a float used to change the scale of center
size_variance : a float used to change the scale of size
iou_threshold : a float value of thresholf of IOU
"""
def __init__(self, center_form_priors, center_variance, size_variance, iou_threshold):
self.center_form_priors = center_form_priors
self.corner_form_priors = box_utils.center_form_to_corner_form(center_form_priors)
self.center_variance = center_variance
self.size_variance = size_variance
self.iou_threshold = iou_threshold
def __call__(self, gt_boxes, gt_labels):
"""
Arguments:
gt_boxes : ground truth boxes
gt_labels : ground truth labels
Returns:
locations of form (batch_size, num_priors, 4) and labels
"""
if type(gt_boxes) is np.ndarray:
gt_boxes = torch.from_numpy(gt_boxes)
if type(gt_labels) is np.ndarray:
gt_labels = torch.from_numpy(gt_labels)
boxes, labels = box_utils.assign_priors(gt_boxes, gt_labels,
self.corner_form_priors, self.iou_threshold)
boxes = box_utils.corner_form_to_center_form(boxes)
locations = box_utils.convert_boxes_to_locations(boxes, self.center_form_priors, self.center_variance, self.size_variance)
return locations, labels
class BottleneckLSTMCell(nn.Module):
""" Creates a LSTM layer cell
Arguments:
input_channels : variable used to contain value of number of channels in input
hidden_channels : variable used to contain value of number of channels in the hidden state of LSTM cell
"""
def __init__(self, input_channels, hidden_channels):
super(BottleneckLSTMCell, self).__init__()
assert hidden_channels % 2 == 0
self.input_channels = int(input_channels)
self.hidden_channels = int(hidden_channels)
self.num_features = 4
self.W = nn.Conv2d(in_channels=self.input_channels, out_channels=self.input_channels, kernel_size=3, groups=self.input_channels, stride=1, padding=1)
self.Wy = nn.Conv2d(int(self.input_channels+self.hidden_channels), self.hidden_channels, kernel_size=1)
self.Wi = nn.Conv2d(self.hidden_channels, self.hidden_channels, 3, 1, 1, groups=self.hidden_channels, bias=False)
self.Wbi = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbf = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbc = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbo = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.relu = nn.ReLU6()
# self.Wci = None
# self.Wcf = None
# self.Wco = None
logging.info("Initializing weights of lstm")
self._initialize_weights()
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward(self, x, h, c): #implemented as mentioned in paper here the only difference is Wbi, Wbf, Wbc & Wbo are commuted all together in paper
"""
Arguments:
x : input tensor
h : hidden state tensor
c : cell state tensor
Returns:
output tensor after LSTM cell
"""
x = self.W(x)
y = torch.cat((x, h),1) #concatenate input and hidden layers
i = self.Wy(y) #reduce to hidden layer size
b = self.Wi(i) #depth wise 3*3
ci = torch.sigmoid(self.Wbi(b))
cf = torch.sigmoid(self.Wbf(b))
cc = cf * c + ci * self.relu(self.Wbc(b))
co = torch.sigmoid(self.Wbo(b))
ch = co * self.relu(cc)
return ch, cc
def init_hidden(self, batch_size, hidden, shape):
"""
Arguments:
batch_size : an int variable having value of batch size while training
hidden : an int variable having value of number of channels in hidden state
shape : an array containing shape of the hidden and cell state
Returns:
cell state and hidden state
"""
# if self.Wci is None:
# self.Wci = Variable(torch.zeros(1, hidden, shape[0], shape[1])).cuda()
# self.Wcf = Variable(torch.zeros(1, hidden, shape[0], shape[1])).cuda()
# self.Wco = Variable(torch.zeros(1, hidden, shape[0], shape[1])).cuda()
# else:
# assert shape[0] == self.Wci.size()[2], 'Input Height Mismatched!'
# assert shape[1] == self.Wci.size()[3], 'Input Width Mismatched!'
return (Variable(torch.zeros(batch_size, hidden, shape[0], shape[1])).cuda(),
Variable(torch.zeros(batch_size, hidden, shape[0], shape[1])).cuda()
)
class BottleneckLSTM(nn.Module):
def __init__(self, input_channels, hidden_channels, height, width, batch_size):
""" Creates Bottleneck LSTM layer
Arguments:
input_channels : variable having value of number of channels of input to this layer
hidden_channels : variable having value of number of channels of hidden state of this layer
height : an int variable having value of height of the input
width : an int variable having value of width of the input
batch_size : an int variable having value of batch_size of the input
Returns:
Output tensor of LSTM layer
"""
super(BottleneckLSTM, self).__init__()
self.input_channels = int(input_channels)
self.hidden_channels = int(hidden_channels)
self.cell = BottleneckLSTMCell(self.input_channels, self.hidden_channels)
(h, c) = self.cell.init_hidden(batch_size, hidden=self.hidden_channels, shape=(height, width))
self.hidden_state = h
self.cell_state = c
def forward(self, input):
new_h, new_c = self.cell(input, self.hidden_state, self.cell_state)
self.hidden_state = new_h
self.cell_state = new_c
return self.hidden_state
def crop_like(x, target):
"""
Arguments:
x : a tensor whose shape has to be cropped
target : a tensor whose shape has to assert on x
Returns:
x having same shape as target
"""
if x.size()[2:] == target.size()[2:]:
return x
else:
height = target.size()[2]
width = target.size()[3]
crop_h = torch.FloatTensor([x.size()[2]]).sub(height).div(-2)
crop_w = torch.FloatTensor([x.size()[3]]).sub(width).div(-2)
# fixed indexing for PyTorch 0.4
return F.pad(x, [int(crop_w.ceil()[0]), int(crop_w.floor()[0]), int(crop_h.ceil()[0]), int(crop_h.floor()[0])])
class MobileNetV1(nn.Module):
def __init__(self, num_classes=1024, alpha=1):
"""torch.nn.module for mobilenetv1 upto conv12
Arguments:
num_classes : an int variable having value of total number of classes
alpha : a float used as width multiplier for channels of model
"""
super(MobileNetV1, self).__init__()
# upto conv 12
self.model = nn.Sequential(
conv_bn(3, 32*alpha, 2),
conv_dw(32*alpha, 64*alpha, 1),
conv_dw(64*alpha, 128*alpha, 2),
conv_dw(128*alpha, 128*alpha, 1),
conv_dw(128*alpha, 256*alpha, 2),
conv_dw(256*alpha, 256*alpha, 1),
conv_dw(256*alpha, 512*alpha, 2),
conv_dw(512*alpha, 512*alpha, 1),
conv_dw(512*alpha, 512*alpha, 1),
conv_dw(512*alpha, 512*alpha, 1),
conv_dw(512*alpha, 512*alpha, 1),
conv_dw(512*alpha, 512*alpha, 1),
)
logging.info("Initializing weights of base net")
self._initialize_weights()
#self.fc = nn.Linear(1024, num_classes)
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward(self, x):
"""
Arguments:
x : a tensor which is used as input for the model
Returns:
a tensor which is output of the model
"""
x = self.model(x)
return x
class SSD(nn.Module):
def __init__(self,num_classes, batch_size, alpha = 1, is_test=False, config = None, device = None):
"""
Arguments:
num_classes : an int variable having value of total number of classes
batch_size : an int variable having value of batch size
alpha : a float used as width multiplier for channels of model
is_Test : a bool used to make model ready for testing
config : a dict containing all the configuration parameters
"""
super(SSD, self).__init__()
# Decoder
self.is_test = is_test
self.config = config
self.num_classes = num_classes
if device:
self.device = device
else:
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if is_test:
self.config = config
self.priors = config.priors.to(self.device)
self.conv13 = conv_dw(512*alpha, 1024*alpha, 2) #not using conv14 as mentioned in paper
self.bottleneck_lstm1 = BottleneckLSTM(input_channels=1024*alpha, hidden_channels=256*alpha, height=10, width=10, batch_size=batch_size)
self.fmaps_1 = nn.Sequential(
nn.Conv2d(in_channels=int(256*alpha), out_channels=int(128*alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=128*alpha, out_channels=256*alpha, kernel_size=3, stride=2, padding=1),
)
self.fmaps_2 = nn.Sequential(
nn.Conv2d(in_channels=int(256*alpha), out_channels=int(64*alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=64*alpha, out_channels=128*alpha, kernel_size=3, stride=2, padding=1),
)
self.fmaps_3 = nn.Sequential(
nn.Conv2d(in_channels=int(128*alpha), out_channels=int(64*alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=64*alpha, out_channels=128*alpha, kernel_size=3, stride=2, padding=1),
)
self.fmaps_4 = nn.Sequential(
nn.Conv2d(in_channels=int(128*alpha), out_channels=int(32*alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=32*alpha, out_channels=64*alpha, kernel_size=3, stride=2, padding=1),
)
self.regression_headers = nn.ModuleList([
SeperableConv2d(in_channels=512*alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256*alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256*alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128*alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128*alpha, out_channels=6 * 4, kernel_size=3, padding=1),
nn.Conv2d(in_channels=int(64*alpha), out_channels=6 * 4, kernel_size=1),
])
self.classification_headers = nn.ModuleList([
SeperableConv2d(in_channels=512*alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256*alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256*alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128*alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128*alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
nn.Conv2d(in_channels=int(64*alpha), out_channels=6 * num_classes, kernel_size=1),
])
logging.info("Initializing weights of SSD")
self._initialize_weights()
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def compute_header(self, i, x): #ssd method to calculate headers
"""
Arguments:
i : an int used to use particular classification and regression layer
x : a tensor used as input to layers
Returns:
locations and confidences of the predictions
"""
confidence = self.classification_headers[i](x)
confidence = confidence.permute(0, 2, 3, 1).contiguous()
confidence = confidence.view(confidence.size(0), -1, self.num_classes)
location = self.regression_headers[i](x)
location = location.permute(0, 2, 3, 1).contiguous()
location = location.view(location.size(0), -1, 4)
return confidence, location
def forward(self, x):
"""
Arguments:
x : a tensor which is used as input for the model
Returns:
confidences and locations of predictions made by model during training
or
confidences and boxes of predictions made by model during testing
"""
confidences = []
locations = []
header_index=0
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.conv13(x)
x = self.bottleneck_lstm1(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_1(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_2(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_3(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_4(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
confidences = torch.cat(confidences, 1)
locations = torch.cat(locations, 1)
if self.is_test: #while testing convert locations to boxes
confidences = F.softmax(confidences, dim=2)
boxes = box_utils.convert_locations_to_boxes(
locations, self.priors, self.config.center_variance, self.config.size_variance
)
boxes = box_utils.center_form_to_corner_form(boxes)
return confidences, boxes
else:
return confidences, locations
class MobileVOD(nn.Module):
"""
Module to join encoder and decoder of predictor model
"""
def __init__(self, pred_enc, pred_dec):
"""
Arguments:
pred_enc : an object of MobilenetV1 class
pred_dec : an object of SSD class
"""
super(MobileVOD, self).__init__()
self.pred_encoder = pred_enc
self.pred_decoder = pred_dec
def forward(self, seq):
"""
Arguments:
seq : a tensor used as input to the model
Returns:
confidences and locations of predictions made by model
"""
x = self.pred_encoder(seq)
confidences, locations = self.pred_decoder(x)
return confidences , locations
def detach_hidden(self):
"""
Detaches hidden state and cell state of all the LSTM layers from the graph
"""
self.pred_decoder.bottleneck_lstm1.hidden_state.detach_()
self.pred_decoder.bottleneck_lstm1.cell_state.detach_()
My error…
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 256, 1, 1]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Process finished with exit code 1
Which references…
2023-04-18 12:22:57,045 - root - INFO - Start training from epoch 0.
/home/steven/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnConvolutionBackward0. Traceback of forward call that caused the error:
Thank you so much, it works
Thank you very much for providing the solution.
I was having the same issue and was trying different solutions by replacing the in-place operations to out-places and even by enabling the anomaly detection (no exceptions were raised).
Found out the issue was in the arrangement of backward function:
Rearranged from this:
def backward(self, unet_loss, dis_loss):
dis_loss.backward(retain_graph = True)
self.dis_optimizer.step()
unet_loss.backward()
self.unet_optimizer.step()
To this (has solved the issue)
def backward(self, unet_loss, dis_loss):
dis_loss.backward(retain_graph = True)
unet_loss.backward()
self.dis_optimizer.step()
self.unet_optimizer.step()