Hi, I am trying to detect objects in technical drawings using a faster R-CNN. As I am new to Pytorch, and do not fully know the API, I am having trouble training a deep neural network for my project. I am using a faster R-CNN for the test. Iniatializing without weights and instead training from scratchusing my own data that was labelled with Label Studio.

Below is the code I have tried and have been adapting based on the documentation. The num_classes are 5 classes in my objects plus the background. I am showcasing with random generated data. I am receiving errors that make me question my implementation. Specially “Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [5, 6]”

Thank you in advance for your help.

```
# Example with random data with 500 images and 5 categories
images = torch.rand(500, 1, 1000, 1000, dtype=torch.double)
boxes = torch.rand(5, 5, 4, dtype=torch.double) # 500 images each with 5 bounding boxes
boxes[:, :, 2:4] = boxes[:, :, 0:2] + boxes[:, :, 2:4] # Sets xmin, ymin, w, h dims
labels = torch.randint(low=1, high=5, size=(5, 5), dtype=torch.uint8)
images = list(image for image in images) # Creates list of tensors
targets = []
for n in range(len(images)):
d = {}
d["boxes"] = boxes[n]
d["labels"] = labels[n]
targets.append(d)
# Create model
# Load a pre-trained ResNet model without specifying num_classes to use as backbone
resnet = torchvision.models.resnet50(pretrained=True)
# Modify the final classification layer for your specific number of classes
num_classes = 6 # 5 plus background
in_features = resnet.fc.in_features
resnet.fc = torch.nn.Linear(in_features, num_classes)
resnet.out_channels = 1280
# Define an anchor generator
rpn_anchor_generator = AnchorGenerator(
sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),) * 5
)
# Create a Faster R-CNN model with the modified ResNet backbone
model = FasterRCNN(
backbone=resnet,
num_classes=num_classes,
rpn_anchor_generator=rpn_anchor_generator
)
# Train model
optimizer = torch.optim.SGD(model.parameters(), lr=1e-7, momentum=0)
NUM_EPOCHS = 6
# Store losses to plot loss in train and validation
loss_progress = []
start_time = time.time()
for epoch in range(NUM_EPOCHS):
print(f"Beginning epoch: {epoch}")
model.train()
optimizer.zero_grad()
total_loss = 0.0
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
losses.backward()
optimizer.step()
total_loss += losses.item()
print(f"Epoch [{epoch+1}/{NUM_EPOCHS}], Loss: {losses.item():.4f}")
loss_progress.append(losses.item())
end_time = time.time()
elapsed_time_seconds = end_time - start_time
elapsed_time_hours = elapsed_time_seconds / 3600
```