Region Proposal Network implementation loses all information in feature map?

Hello, I’m currently working to implement a version of the region proposal network on musical audio transformed into autocorrelated frequency spectrums, to try to draw bounds around the different musical sections. (Chorus, verse, intro, etc.)

When I try to run the system, the feature extractor quickly turns the image into nothing, only after 2 epochs on a 550 song dataset. I’m going to try to include all of the feature maps I have. Any tips on how to fix this?? I’m quite desperate at this point!

with no training:

after 1 epoch:

after 2 epochs

I train in accordance with the RPN (training backend + rpn first and then fine tuning rpn layers), here are my parameters:

For backend + rpn:
epochs = 10
lr = .001
optimizer = optim.Adam(rpn.parameters(), lr=lr)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=550, gamma=0.7)

for rpn fine tuning:
epochs = 10
lr = .0001
optimizer = optim.Adam(rpn.parameters(), lr=lr)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=550, gamma=0.6)

also my weights seem to be random as well. but im not sure what a good way to check that is. looking for any help!!

I would generally try to overfit a small dataset (e.g. just 10 samples) by playing around with some hyperparameters and make sure your model is able to do so. If this wouldn’t be possible your training script might have some issues, which would need to be debugged (e.g. forgetting to call optimizer.zero_grad() etc.). Once this is working, you could try to scale up the use case again.

This is my current training loop:
after the backend+rpn is trained I freeze the grad layers of the backend and repeat for just the rpn.
Even on a smaller dataset it doesn’t seem to be converging on anything at all. I’ve provided some of the loss measures below as well. Does it look like the loop needs debugged??

    for i in range(epochs):
      for j in range(len(labels)):

          ################## Forward
          optimizer.zero_grad()
          sampleimg = img[j].reshape(3, 800, 800).float().unsqueeze(0)
          rpn_loc, rpn_score, gt_rpn_loc, gt_rpn_score = rpn(image=sampleimg.to(device), bbox=torch.from_numpy(labels[j]).to(device), img_size=[800,800], device=device)
          cls_loss, loc_loss, loss = rpn_loss(rpn_loc, rpn_score, gt_rpn_loc, gt_rpn_score, rpn_lambda=rpn_lambda)

          ################# Log
          iter = i*550 + j+1
          total_loss = total_loss + loss.item()
          total_cls_loss = total_cls_loss + cls_loss.item()
          total_loc_loss = total_cls_loss + loc_loss.item()

          #if loss.item() > 25:
          print('iter', iter, 'cls_loss', cls_loss.item(), 'loc_loss:', loc_loss.item(), 'loss:', loss.item(), "lr:", optimizer.param_groups[0]['lr'])

          if not (iter%50):
            print('iter', iter, 'cls_loss', cls_loss.item(), 'loc_loss:', loc_loss.item(), 'loss:', loss.item(), 'total_loss:', total_loss/50, "lr:", optimizer.param_groups[0]['lr'])
            prf3, prf3trim, prf5, prf5trim, ious = predict_set(imgval, lblval, sclval, rpn)
            writerP.writerow({'p3trim': prf3trim[0], 'p5': prf5[0], 'p5trim': prf5trim[0], 'r3trim':prf3trim[1], 'r5': prf5[1], 'r5trim':prf5trim[1],  'f13trim': prf3trim[2], 'f15': prf5[2], 'f15trim': prf5trim[2]})
            writerT.writerow({'epoch': i+1, 'iter': iter, 'total_loss': total_loss/50, 'cls_loss': total_cls_loss/50, 'loc_loss': total_loc_loss/50, 'lr': optimizer.param_groups[0]['lr'], 'precision': prf3[0] , 'recall': prf3[1], 'f1':prf3[2], 'average_iou': ious})
            total_loss = 0
            total_cls_loss = 0
            total_loc_loss = 0

          ################# Backward
          loss.backward()
          optimizer.step()
          scheduler.step()
        
      if not ((i+1)%1):
        torch.save(rpn.state_dict(), model_path + "state_dict_epoch_" +str(i+1))

    t1 = time.time()
    print("trained in", t1-t0, "seconds")

    for param in rpn.extractor.parameters():
      param.requires_grad = False

Training loss: (ignore the iter value i had changed the dataset size)
theres no real convergence that i see happening so I’m not sure what next steps might be!

iter 1 cls_loss 0.6921350955963135 loc_loss: 1.1748329508509792 loss: 1.8669680464472926 lr: 1e-12
iter 2 cls_loss 0.6928329467773438 loc_loss: 1.568880252229795 loss: 2.261713199007139 lr: 1e-12
iter 3 cls_loss 0.6933569312095642 loc_loss: 0.9199517550794022 loss: 1.6133086862889665 lr: 1e-12
iter 4 cls_loss 0.6931822896003723 loc_loss: 0.8987026671931192 loss: 1.5918849567934914 lr: 1e-12
iter 5 cls_loss 0.6922513246536255 loc_loss: 1.3166803167574663 loss: 2.008931641411092 lr: 1e-12
iter 6 cls_loss 0.692212700843811 loc_loss: 6.931117958693436 loss: 7.623330659537247 lr: 1e-12
iter 7 cls_loss 0.6921691298484802 loc_loss: 1.2809807530990347 loss: 1.973149882947515 lr: 1e-12
iter 8 cls_loss 0.6930963397026062 loc_loss: 0.7075548610951412 loss: 1.4006512007977474 lr: 1e-12
iter 9 cls_loss 0.6953533291816711 loc_loss: 0.19093948026708718 loss: 0.8862928094487583 lr: 1e-12
iter 10 cls_loss 0.6927700042724609 loc_loss: 1.3882384633475766 loss: 2.0810084676200375 lr: 1e-12
iter 551 cls_loss 0.6923218965530396 loc_loss: 0.9676931799696118 loss: 1.6600150765226513 lr: 5e-13
iter 552 cls_loss 0.6921415328979492 loc_loss: 1.0131206493734641 loss: 1.7052621822714134 lr: 5e-13
iter 553 cls_loss 0.6928712725639343 loc_loss: 1.0163711804627467 loss: 1.709242453026681 lr: 5e-13
iter 554 cls_loss 0.6927764415740967 loc_loss: 0.8245435606038931 loss: 1.5173200021779898 lr: 5e-13
iter 555 cls_loss 0.693758487701416 loc_loss: 0.9173505390809124 loss: 1.6111090267823283 lr: 5e-13
iter 556 cls_loss 0.691859781742096 loc_loss: 5.259640444727239 loss: 5.951500226469335 lr: 5e-13
iter 557 cls_loss 0.6934787034988403 loc_loss: 1.1197831431068284 loss: 1.8132618466056687 lr: 5e-13
iter 558 cls_loss 0.6920543313026428 loc_loss: 0.8964721930378023 loss: 1.5885265243404452 lr: 5e-13
iter 559 cls_loss 0.6941965222358704 loc_loss: 0.14748407200748212 loss: 0.8416805942433525 lr: 5e-13
iter 560 cls_loss 0.6919677257537842 loc_loss: 1.3859773841414285 loss: 2.0779451098952126 lr: 5e-13
iter 1101 cls_loss 0.6920477151870728 loc_loss: 1.2645855750314907 loss: 1.9566332902185635 lr: 2.5e-13
iter 1102 cls_loss 0.6925402879714966 loc_loss: 0.999127364905909 loss: 1.6916676528774057 lr: 2.5e-13
iter 1103 cls_loss 0.6931787133216858 loc_loss: 0.9238206075899552 loss: 1.6169993209116411 lr: 2.5e-13
iter 1104 cls_loss 0.6921634674072266 loc_loss: 1.29035310201729 loss: 1.9825165694245166 lr: 2.5e-13
iter 1105 cls_loss 0.6932936906814575 loc_loss: 1.0621549060023268 loss: 1.7554485966837843 lr: 2.5e-13
iter 1106 cls_loss 0.6920458078384399 loc_loss: 6.428231034660062 loss: 7.120276842498502 lr: 2.5e-13
iter 1107 cls_loss 0.693122923374176 loc_loss: 1.239628441853722 loss: 1.932751365227898 lr: 2.5e-13
iter 1108 cls_loss 0.6926088333129883 loc_loss: 0.7282878707060372 loss: 1.4208967040190255 lr: 2.5e-13
iter 1109 cls_loss 0.6952726244926453 loc_loss: 0.1984593226853661 loss: 0.8937319471780114 lr: 2.5e-13
iter 1110 cls_loss 0.692840576171875 loc_loss: 1.119079295402332 loss: 1.811919871574207 lr: 2.5e-13
iter 1651 cls_loss 0.6919476985931396 loc_loss: 1.1444164066744387 loss: 1.8363641052675783 lr: 1.25e-13
iter 1652 cls_loss 0.6924058794975281 loc_loss: 1.5081326828609696 loss: 2.2005385623584974 lr: 1.25e-13
iter 1653 cls_loss 0.691190779209137 loc_loss: 0.9688563152428173 loss: 1.6600470944519543 lr: 1.25e-13
iter 1654 cls_loss 0.6926459670066833 loc_loss: 0.865931390420291 loss: 1.5585773574269743 lr: 1.25e-13
iter 1655 cls_loss 0.6927735805511475 loc_loss: 1.0157101590998752 loss: 1.7084837396510226 lr: 1.25e-13
iter 1656 cls_loss 0.6926401853561401 loc_loss: 5.479507388795178 loss: 6.172147574151318 lr: 1.25e-13
iter 1657 cls_loss 0.6929158568382263 loc_loss: 1.1307573162410507 loss: 1.823673173079277 lr: 1.25e-13
iter 1658 cls_loss 0.6937358975410461 loc_loss: 0.8427326207180696 loss: 1.5364685182591158 lr: 1.25e-13
iter 1659 cls_loss 0.6950052380561829 loc_loss: 0.19093948364407795 loss: 0.8859447217002608 lr: 1.25e-13
iter 1660 cls_loss 0.691266655921936 loc_loss: 1.6677223698661747 loss: 2.3589890257881105 lr: 1.25e-13
iter 2201 cls_loss 0.6921338438987732 loc_loss: 0.8173786652846236 loss: 1.5095125091833967 lr: 6.25e-14
iter 2202 cls_loss 0.6926960349082947 loc_loss: 1.1058941637139175 loss: 1.7985901986222121 lr: 6.25e-14
iter 2203 cls_loss 0.6926490068435669 loc_loss: 0.9512539266621713 loss: 1.6439029335057382 lr: 6.25e-14
iter 2204 cls_loss 0.6914896368980408 loc_loss: 1.0961379017771076 loss: 1.7876275386751483 lr: 6.25e-14
iter 2205 cls_loss 0.6928700804710388 loc_loss: 1.1408305911908403 loss: 1.8337006716618791 lr: 6.25e-14
iter 2206 cls_loss 0.691948413848877 loc_loss: 6.541317086033898 loss: 7.233265499882775 lr: 6.25e-14
iter 2207 cls_loss 0.6928762197494507 loc_loss: 0.9450663026222494 loss: 1.6379425223717001 lr: 6.25e-14
iter 2208 cls_loss 0.6932937502861023 loc_loss: 0.6348001757078745 loss: 1.3280939259939768 lr: 6.25e-14
iter 2209 cls_loss 0.6948574781417847 loc_loss: 0.19552330549320443 loss: 0.8903807836349891 lr: 6.25e-14
iter 2210 cls_loss 0.6921445727348328 loc_loss: 1.0368458827238227 loss: 1.7289904554586555 lr: 6.25e-14
iter 2751 cls_loss 0.6924130320549011 loc_loss: 0.9761594394298212 loss: 1.6685724714847223 lr: 3.125e-14
iter 2752 cls_loss 0.6922179460525513 loc_loss: 0.9999993759318215 loss: 1.6922173219843728 lr: 3.125e-14
iter 2753 cls_loss 0.692672610282898 loc_loss: 0.9623432596494101 loss: 1.655015869932308 lr: 3.125e-14
iter 2754 cls_loss 0.692301332950592 loc_loss: 1.4034487788255199 loss: 2.0957501117761117 lr: 3.125e-14
iter 2755 cls_loss 0.6926637291908264 loc_loss: 1.0187941746228057 loss: 1.7114579038136322 lr: 3.125e-14
iter 2756 cls_loss 0.6915877461433411 loc_loss: 5.657801986236839 loss: 6.34938973238018 lr: 3.125e-14
iter 2757 cls_loss 0.6936756372451782 loc_loss: 1.123884685556105 loss: 1.8175603228012833 lr: 3.125e-14
iter 2758 cls_loss 0.692488431930542 loc_loss: 0.6893397690239533 loss: 1.3818282009544953 lr: 3.125e-14
iter 2759 cls_loss 0.6943942308425903 loc_loss: 0.18651234479375112 loss: 0.8809065756363414 lr: 3.125e-14
iter 2760 cls_loss 0.6912609934806824 loc_loss: 1.7567043651126357 loss: 2.447965358593318 lr: 3.125e-14
iter 3301 cls_loss 0.6920527815818787 loc_loss: 1.4109532155978333 loss: 2.1030059971797117 lr: 1.5625e-14
iter 3302 cls_loss 0.6930571794509888 loc_loss: 1.4379827243044963 loss: 2.131039903755485 lr: 1.5625e-14
iter 3303 cls_loss 0.6919953227043152 loc_loss: 0.9442011426974493 loss: 1.6361964654017644 lr: 1.5625e-14
iter 3304 cls_loss 0.6922560334205627 loc_loss: 0.9508633485780702 loss: 1.643119381998633 lr: 1.5625e-14
iter 3305 cls_loss 0.692486047744751 loc_loss: 1.2154362902909046 loss: 1.9079223380356556 lr: 1.5625e-14
iter 3306 cls_loss 0.6925199627876282 loc_loss: 5.0682570310276995 loss: 5.760776993815328 lr: 1.5625e-14
iter 3307 cls_loss 0.6927266716957092 loc_loss: 1.0641486991086833 loss: 1.7568753708043925 lr: 1.5625e-14
iter 3308 cls_loss 0.6916861534118652 loc_loss: 0.7790250950284503 loss: 1.4707112484403155 lr: 1.5625e-14
iter 3309 cls_loss 0.6947327852249146 loc_loss: 0.1865123422850028 loss: 0.8812451275099173 lr: 1.5625e-14
iter 3310 cls_loss 0.690968930721283 loc_loss: 1.533132083432568 loss: 2.2241010141538506 lr: 1.5625e-14
iter 3851 cls_loss 0.6919827461242676 loc_loss: 1.0037790637070885 loss: 1.695761809831356 lr: 7.8125e-15
iter 3852 cls_loss 0.6923617720603943 loc_loss: 1.048074671271653 loss: 1.7404364433320474 lr: 7.8125e-15
iter 3853 cls_loss 0.6926763653755188 loc_loss: 0.8896646751897661 loss: 1.582341040565285 lr: 7.8125e-15
iter 3854 cls_loss 0.6930360198020935 loc_loss: 0.9371028014168634 loss: 1.630138821218957 lr: 7.8125e-15
iter 3855 cls_loss 0.6935461759567261 loc_loss: 1.0578630604796235 loss: 1.7514092364363496 lr: 7.8125e-15
iter 3856 cls_loss 0.6928619742393494 loc_loss: 5.509555762562881 loss: 6.20241773680223 lr: 7.8125e-15
iter 3857 cls_loss 0.692651629447937 loc_loss: 0.9407338708410223 loss: 1.6333855002889592 lr: 7.8125e-15
iter 3858 cls_loss 0.6928041577339172 loc_loss: 0.7508190573856177 loss: 1.443623215119535 lr: 7.8125e-15
iter 3859 cls_loss 0.6942407488822937 loc_loss: 0.14748406997999505 loss: 0.8417248188622888 lr: 7.8125e-15
iter 3860 cls_loss 0.6908990740776062 loc_loss: 1.358448611364286 loss: 2.049347685441892 lr: 7.8125e-15
iter 4401 cls_loss 0.6928650736808777 loc_loss: 0.5995509127304399 loss: 1.2924159864113176 lr: 3.90625e-15
iter 4402 cls_loss 0.6927019357681274 loc_loss: 1.1400527712639443 loss: 1.8327547070320718 lr: 3.90625e-15
iter 4403 cls_loss 0.6926475763320923 loc_loss: 0.9474380431879652 loss: 1.6400856195200575 lr: 3.90625e-15
iter 4404 cls_loss 0.6924384832382202 loc_loss: 0.7837999163695561 loss: 1.4762383996077764 lr: 3.90625e-15
iter 4405 cls_loss 0.6929620504379272 loc_loss: 1.046277092922411 loss: 1.7392391433603382 lr: 3.90625e-15
iter 4406 cls_loss 0.6923022866249084 loc_loss: 5.9394521815746675 loss: 6.631754468199576 lr: 3.90625e-15
iter 4407 cls_loss 0.6920892596244812 loc_loss: 1.267811104205232 loss: 1.9599003638297132 lr: 3.90625e-15
iter 4408 cls_loss 0.6928983330726624 loc_loss: 0.8110985994828603 loss: 1.5039969325555227 lr: 3.90625e-15
iter 4409 cls_loss 0.6946788430213928 loc_loss: 0.19093948386148768 loss: 0.8856183268828806 lr: 3.90625e-15
iter 4410 cls_loss 0.6907579898834229 loc_loss: 1.6043845423542182 loss: 2.295142532237641 lr: 3.90625e-15
iter 4951 cls_loss 0.6920508742332458 loc_loss: 0.8892209813595755 loss: 1.5812718555928214 lr: 1.953125e-15
iter 4952 cls_loss 0.6929208636283875 loc_loss: 1.193566349824582 loss: 1.8864872134529695 lr: 1.953125e-15
iter 4953 cls_loss 0.6916599869728088 loc_loss: 0.9482600737886857 loss: 1.6399200607614945 lr: 1.953125e-15
iter 4954 cls_loss 0.6921523213386536 loc_loss: 0.7513913492502884 loss: 1.443543670588942 lr: 1.953125e-15
iter 4955 cls_loss 0.6920864582061768 loc_loss: 0.9950579197282083 loss: 1.687144377934385 lr: 1.953125e-15
iter 4956 cls_loss 0.6923679709434509 loc_loss: 5.302933221442688 loss: 5.995301192386139 lr: 1.953125e-15
iter 4957 cls_loss 0.6931087970733643 loc_loss: 0.8956661914294739 loss: 1.5887749885028382 lr: 1.953125e-15
iter 4958 cls_loss 0.693481981754303 loc_loss: 0.6879967055198455 loss: 1.3814786872741485 lr: 1.953125e-15
iter 4959 cls_loss 0.6949952840805054 loc_loss: 0.19459222492953449 loss: 0.8895875090100398 lr: 1.953125e-15
iter 4960 cls_loss 0.6917133331298828 loc_loss: 1.7399966406498788 loss: 2.4317099737797614 lr: 1.953125e-15
iter 5501 cls_loss 0.6923636794090271 loc_loss: 0.8851741128975726 loss: 1.5775377923065999 lr: 9.765625e-16
iter 5502 cls_loss 0.6936227679252625 loc_loss: 1.5263122007665983 loss: 2.219934968691861 lr: 9.765625e-16
iter 5503 cls_loss 0.6926396489143372 loc_loss: 0.8316610601203935 loss: 1.5243007090347307 lr: 9.765625e-16
iter 5504 cls_loss 0.6925647854804993 loc_loss: 0.6322720093905526 loss: 1.324836794871052 lr: 9.765625e-16
iter 5505 cls_loss 0.6929649710655212 loc_loss: 1.1216831374429832 loss: 1.8146481085085044 lr: 9.765625e-16
iter 5506 cls_loss 0.6935408711433411 loc_loss: 6.687089848673985 loss: 7.380630719817326 lr: 9.765625e-16
iter 5507 cls_loss 0.6937468647956848 loc_loss: 1.099996933513208 loss: 1.7937437983088929 lr: 9.765625e-16
iter 5508 cls_loss 0.6925682425498962 loc_loss: 0.7090216981109515 loss: 1.4015899406608479 lr: 9.765625e-16
iter 5509 cls_loss 0.6949331164360046 loc_loss: 0.19140825159823205 loss: 0.8863413680342367 lr: 9.765625e-16
iter 5510 cls_loss 0.6922085285186768 loc_loss: 1.5595740705949672 loss: 2.251782599113644 lr: 9.765625e-16
iter 6051 cls_loss 0.6920532584190369 loc_loss: 0.9025874425829314 loss: 1.5946407010019683 lr: 4.8828125e-16
iter 6052 cls_loss 0.6922112107276917 loc_loss: 1.5329537629363965 loss: 2.2251649736640884 lr: 4.8828125e-16
iter 6053 cls_loss 0.6917952299118042 loc_loss: 0.9254640288849891 loss: 1.6172592587967933 lr: 4.8828125e-16
iter 6054 cls_loss 0.6936994791030884 loc_loss: 0.7055303111592317 loss: 1.39922979026232 lr: 4.8828125e-16
iter 6055 cls_loss 0.6931062340736389 loc_loss: 1.0419786525012027 loss: 1.7350848865748416 lr: 4.8828125e-16
iter 6056 cls_loss 0.6922838687896729 loc_loss: 5.416918140442859 loss: 6.109202009232532 lr: 4.8828125e-16
iter 6057 cls_loss 0.6918630003929138 loc_loss: 1.3007669598747391 loss: 1.992629960267653 lr: 4.8828125e-16
iter 6058 cls_loss 0.691718339920044 loc_loss: 0.9242010493985654 loss: 1.6159193893186092 lr: 4.8828125e-16
iter 6059 cls_loss 0.6952003240585327 loc_loss: 0.14063162065249293 loss: 0.8358319447110256 lr: 4.8828125e-16
iter 6060 cls_loss 0.6917744278907776 loc_loss: 1.1505936693508132 loss: 1.8423680972415908 lr: 4.8828125e-16
iter 6601 cls_loss 0.6924407482147217 loc_loss: 1.0261920301649123 loss: 1.718632778379634 lr: 2.44140625e-16
iter 6602 cls_loss 0.692135751247406 loc_loss: 1.0563355548029094 loss: 1.7484713060503154 lr: 2.44140625e-16
iter 6603 cls_loss 0.6924816966056824 loc_loss: 1.082326420665792 loss: 1.7748081172714745 lr: 2.44140625e-16
iter 6604 cls_loss 0.69303959608078 loc_loss: 0.7692946941293841 loss: 1.4623342902101641 lr: 2.44140625e-16
iter 6605 cls_loss 0.6927696466445923 loc_loss: 0.9614978191346297 loss: 1.654267465779222 lr: 2.44140625e-16
iter 6606 cls_loss 0.6925919055938721 loc_loss: 5.599679212881476 loss: 6.292271118475348 lr: 2.44140625e-16
iter 6607 cls_loss 0.6932664513587952 loc_loss: 0.9579423679470316 loss: 1.6512088193058267 lr: 2.44140625e-16
iter 6608 cls_loss 0.6944498419761658 loc_loss: 0.6431677454731299 loss: 1.3376175874492957 lr: 2.44140625e-16
iter 6609 cls_loss 0.695354163646698 loc_loss: 0.1909394828307 loss: 0.8862936464773981 lr: 2.44140625e-16
iter 6610 cls_loss 0.6905269622802734 loc_loss: 1.8115178058927488 loss: 2.5020447681730222 lr: 2.44140625e-16
iter 7151 cls_loss 0.6921229362487793 loc_loss: 0.8979542059231803 loss: 1.5900771421719595 lr: 1.220703125e-16
iter 7152 cls_loss 0.6923902034759521 loc_loss: 1.3973217595765262 loss: 2.089711963052478 lr: 1.220703125e-16
iter 7153 cls_loss 0.6919090151786804 loc_loss: 1.0465415001403933 loss: 1.7384505153190737 lr: 1.220703125e-16
iter 7154 cls_loss 0.6922047138214111 loc_loss: 0.998557391993635 loss: 1.6907621058150462 lr: 1.220703125e-16
iter 7155 cls_loss 0.6929003000259399 loc_loss: 1.190819025882228 loss: 1.883719325908168 lr: 1.220703125e-16
iter 7156 cls_loss 0.693310022354126 loc_loss: 5.9454044528736 loss: 6.638714475227726 lr: 1.220703125e-16
iter 7157 cls_loss 0.6928622126579285 loc_loss: 1.0082305879126536 loss: 1.701092800570582 lr: 1.220703125e-16
iter 7158 cls_loss 0.6937702894210815 loc_loss: 0.9474234357484318 loss: 1.6411937251695132 lr: 1.220703125e-16
iter 7159 cls_loss 0.6951891183853149 loc_loss: 0.18651233840617176 loss: 0.8817014567914867 lr: 1.220703125e-16
iter 7160 cls_loss 0.6921799182891846 loc_loss: 1.1941112050720615 loss: 1.886291123361246 lr: 1.220703125e-16
iter 7701 cls_loss 0.6922093033790588 loc_loss: 0.8009008301937827 loss: 1.4931101335728414 lr: 6.103515625e-17
iter 7702 cls_loss 0.6925268769264221 loc_loss: 1.2672057135784782 loss: 1.9597325905049003 lr: 6.103515625e-17
iter 7703 cls_loss 0.6919161081314087 loc_loss: 0.9420613635369042 loss: 1.6339774716683129 lr: 6.103515625e-17
iter 7704 cls_loss 0.6924375295639038 loc_loss: 0.6907406189837084 loss: 1.3831781485476122 lr: 6.103515625e-17
iter 7705 cls_loss 0.6921209096908569 loc_loss: 1.3650254277979954 loss: 2.0571463374888523 lr: 6.103515625e-17
iter 7706 cls_loss 0.6928850412368774 loc_loss: 5.914775133446575 loss: 6.607660174683453 lr: 6.103515625e-17
iter 7707 cls_loss 0.6935403347015381 loc_loss: 1.3341921191340345 loss: 2.0277324538355725 lr: 6.103515625e-17
iter 7708 cls_loss 0.6933984160423279 loc_loss: 0.6508947582551122 loss: 1.34429317429744 lr: 6.103515625e-17

The learning rate seems to start at 1e-12, which is tiny. Did you play around with the learning rate and tried other values (e.g. 1e-3 etc.)?

Yep, I varied it from about 0.001 to 1E-12. The feature maps went to nothing quicker when I was using the larger

I also reduced the # of training samples down to 2. No luck there either. Are there good ways to debug the learning?

You could verify that the model parameters are indeed being updated and e.g. also check the gradients etc. If you cannot spot any issues, you could try to start with a working example (i.e. change the dataset to images and targets, which the model can learn) and then try to check the changes between the working example and your custom workflow.