Cannot implement a Faster RCNN model

Hi, I am trying to detect objects in technical drawings using a faster R-CNN. As I am new to Pytorch, and do not fully know the API, I am having trouble training a deep neural network for my project. I am using a faster R-CNN for the test. Iniatializing without weights and instead training from scratchusing my own data that was labelled with Label Studio.

Below is the code I have tried and have been adapting based on the documentation. The num_classes are 5 classes in my objects plus the background. I am showcasing with random generated data. I am receiving errors that make me question my implementation. Specially “Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [5, 6]”

Thank you in advance for your help.

# Example with random data with 500 images and 5 categories

images = torch.rand(500, 1, 1000, 1000, dtype=torch.double)
boxes = torch.rand(5, 5, 4, dtype=torch.double) # 500 images each with 5 bounding boxes
boxes[:, :, 2:4] = boxes[:, :, 0:2] + boxes[:, :, 2:4] # Sets xmin, ymin, w, h dims
labels = torch.randint(low=1, high=5, size=(5, 5), dtype=torch.uint8)
images = list(image for image in images) # Creates list of tensors

targets = []
for n in range(len(images)):
    d = {}
    d["boxes"] = boxes[n]
    d["labels"] = labels[n]

# Create model

# Load a pre-trained ResNet model without specifying num_classes to use as backbone
resnet = torchvision.models.resnet50(pretrained=True)

# Modify the final classification layer for your specific number of classes
num_classes = 6  # 5 plus background
in_features = resnet.fc.in_features
resnet.fc = torch.nn.Linear(in_features, num_classes)
resnet.out_channels = 1280

# Define an anchor generator
rpn_anchor_generator = AnchorGenerator(
    sizes=((32, 64, 128, 256, 512),),
    aspect_ratios=((0.5, 1.0, 2.0),) * 5

# Create a Faster R-CNN model with the modified ResNet backbone
model = FasterRCNN(

# Train model
optimizer = torch.optim.SGD(model.parameters(), lr=1e-7, momentum=0)


# Store losses to plot loss in train and validation
loss_progress = []

start_time = time.time()

for epoch in range(NUM_EPOCHS):
    print(f"Beginning epoch: {epoch}")
    total_loss = 0.0

    loss_dict = model(images, targets)
    losses = sum(loss for loss in loss_dict.values())


    total_loss += losses.item()

    print(f"Epoch [{epoch+1}/{NUM_EPOCHS}], Loss: {losses.item():.4f}")


end_time = time.time()
elapsed_time_seconds = end_time - start_time
elapsed_time_hours = elapsed_time_seconds / 3600
1 Like

Your code fails in:

    d["boxes"] = boxes[n]

IndexError: index 5 is out of bounds for dimension 0 with size 5

since n is coming from range(len(images)) which is [0, 499].


Adding to the previous comment @compvision-learner , maybe you could do…

nDatapoints = 500
images = torch.rand(nDatapoints, 1, 1000, 1000, dtype=torch.double)
# same for boxes

# don't need this now
# images = list(image for image in images) # Creates list of tensors

targets = []
for n in range(nDatapoints):

and it seems to me that torch.double would rather be torch.float32 but you may have some reasons for it.

Thank you @ptrblck and @Mah_Neh for your comments and help fixing the size creation as suggested solved the issue with the dimensions. I also revised the dtypes using torch.float64 and torch.int64 for the labels.

On a second note, you think the model definition and training loop is correct? I am just beginning to understand the pytorch API and been using documentation but unsure if the training process is correct.

1 Like