# How to set Learning Rate for a Neural Network?

I am reading many posts about Learning rate. Most of them are saying to keep it in between 0.1 to 0.001. I have a dataset of 1000 images of 4 classes. Can anyone please tell me, that what should be my proper learning rate. I have fixed it to 0.003 and pretty much giving me results with starting from 50-53% to max 78% to ending with 68%. Can anybody tell, if it is right or should I optimize more? I have tried smaller rate, but giving me also same results. Thanks.

There are a few ideas that tries to do test runs to determine the learning rate. I know of one that the fastai library uses where they start with a low learning rate and increases that until the loss diverges. At the point where it diverges, the learning rate was to high so they dial it back down a bit.

That is easy to implement yourself in vanilla pytorch with one of the learning rate schedulers. If you tried a smaller learning rate and it gets the same result there might be other things that you can spend your time on to improve

1 Like

Thanks, but what will be a ideal Train loss to be? I am doing it in 30 epoch and it is from 1.35, to 0.83 after 30 epoch. Is it okay? or should I do more epoch or make learning rate more small? This is my model -

``````model = models.resnet101(pretrained=True)

for param in model.parameters():

model.fc = nn.Sequential(nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.4),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Dropout(0.4),
nn.Linear(512,4),
nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

optimizer = optim.Adam(model.fc.parameters(), lr = 0.000005)
``````

You’re asking the unanswerable questions of ML my friend It’s hard to say how low a loss should be before it’s ‘good’. Try creating an evaluation set and check the loss/accuracy on that one. There are a few tricks to evaluate your models explained on the internets

1 Like

yeah, I know, actually I am pretty much confused here. I have a validation set with my training set. As I told, I have set it to 30 epochs, Here is the result -

Epoch: 1/30
Train loss: 1.3596…
Valid loss: 1.3222…
Accuracy: 0.2812…
Epoch: 2/30
Train loss: 1.2370…
Valid loss: 1.1968…
Accuracy: 0.5000…
Epoch: 3/30
Train loss: 1.0712…
Valid loss: 1.0764…
Accuracy: 0.5625…
Epoch: 4/30
Train loss: 0.9976…
Valid loss: 0.9001…
Accuracy: 0.5938…
Epoch: 5/30
Train loss: 0.9390…
Valid loss: 1.2284…
Accuracy: 0.5625…
Epoch: 6/30
Train loss: 0.9035…
Valid loss: 1.3175…
Accuracy: 0.5000…
Epoch: 7/30
Train loss: 0.8761…
Valid loss: 1.0062…
Accuracy: 0.6250…
Epoch: 8/30
Train loss: 0.8465…
Valid loss: 1.0808…
Accuracy: 0.6562…
Epoch: 9/30
Train loss: 0.8496…
Valid loss: 0.7764…
Accuracy: 0.7812…
Epoch: 10/30
Train loss: 0.8181…
Valid loss: 1.0402…
Accuracy: 0.5938…
Epoch: 11/30
Train loss: 0.8326…
Valid loss: 1.1302…
Accuracy: 0.5625…
Epoch: 12/30
Train loss: 0.7943…
Valid loss: 1.0503…
Accuracy: 0.6250…
Epoch: 13/30
Train loss: 0.7826…
Valid loss: 1.1598…
Accuracy: 0.6562…
Epoch: 14/30
Train loss: 0.7651…
Valid loss: 1.2678…
Accuracy: 0.5625…
Epoch: 15/30
Train loss: 0.7624…
Valid loss: 1.2643…
Accuracy: 0.5625…
Epoch: 16/30
Train loss: 0.7763…
Valid loss: 0.7775…
Accuracy: 0.7188…
Epoch: 17/30
Train loss: 0.7566…
Valid loss: 1.0853…
Accuracy: 0.6562…
Epoch: 18/30
Train loss: 0.7655…
Valid loss: 0.6543…
Accuracy: 0.7188…
Epoch: 19/30
Train loss: 0.7474…
Valid loss: 0.7556…
Accuracy: 0.6875…
Epoch: 20/30
Train loss: 0.7582…
Valid loss: 1.4987…
Accuracy: 0.4375…
Epoch: 21/30
Train loss: 0.7262…
Valid loss: 0.9582…
Accuracy: 0.7500…
Epoch: 22/30
Train loss: 0.7269…
Valid loss: 1.0767…
Accuracy: 0.5312…
Epoch: 23/30
Train loss: 0.7332…
Valid loss: 0.7451…
Accuracy: 0.7188…
Epoch: 24/30
Train loss: 0.7295…
Valid loss: 0.6882…
Accuracy: 0.7500…
Epoch: 25/30
Train loss: 0.7382…
Valid loss: 1.2635…
Accuracy: 0.6250…
Epoch: 26/30
Train loss: 0.7229…
Valid loss: 0.8446…
Accuracy: 0.6562…
Epoch: 27/30
Train loss: 0.7186…
Valid loss: 1.0239…
Accuracy: 0.6562…
Epoch: 28/30
Train loss: 0.7231…
Valid loss: 0.8888…
Accuracy: 0.7188…
Epoch: 29/30
Train loss: 0.7156…
Valid loss: 1.1398…
Accuracy: 0.6875…
Epoch: 30/30
Train loss: 0.7100…
Valid loss: 0.6227…
Accuracy: 0.7188…

Can you please tell if my result is good or not? I know here we can’t tell that if this model is good or bad, but at least a little clue can help me determine should I tweak or not. Thank you for your help.

Sure thing

• The training loss goes down. This is good as it means you implemented the training correctly and that the dataset is learnable
• Validation loss mostly goes down, great! It is a bit unstable though. Do you validate on the whole dataset? How big is the validation dataset?
• Accuracy goes up, peaks at epoch 9 and goes down - but not that bad. I’d say completely normal

To improve

• Run the experiment several times (10-15 if you have the time) and create/use a tool to see if the trainings follow this pattern repeatably
• Inspect the validation data that is incorrect and compare that to correct data. Can you find any pattern where/why it fucks up? If you are doing classification, try vizualising a confusion matrix
• Has anyone attempted to solve your problem before? Can you look at their solution to inspire you further?

Just some trix, let me now if you want to discuss further

1 Like

Hey, I will do that 10 times, but what tool should I use? I really do not know about it, really. See, I have almost 1000 images in each class, containing 4 class. In the valid class, I have 8 images in each four class. Now, I am having the confusion matrix for each class for 30 epoch and also I have calculated sensitivity and specificity for each class for each epoch too. I do not know seriously if someone already did it before or not. How can I know where this thing goes wrong? This thing is a little confusing.

Okay so many people like tensorboard(x) for plotting loss and accuracy. I prefer visdom as it offers you to plot richer data, but is a bit complex. Look into those!

8 images per class is simply too little. Either get more validation images or possible take some training images out of the training set and put it in the validation set. A normal amount is 80% training and 20% validation (more complicated than that but lets not get into it)

Good that you did the calculations What do you mean with “thing goes wrong”? Which images that get misclassified? Thats some old school programming ya know. You have to keep track of the image id / paths but don’t worry too much about that for now if that’s difficult. There are a TON of image classification projects out there. Simply google for “image classification deep learning medium” for more info

Very very sorry for the late reply, been busy with other studies and all. So, I have 10,000 Images in four folders/classes. From that I have put 1000 in each class training set, rest 9000 is still unseen. So, can I put 150 images from that train set to valid set? Will it be okay?
Also, I need a bigger help from you. As I have got the confusion matrix, but I am unable to interpret them, I have studied from very blogs, but there are few about multi-class conf_matrix. Can you please help? Here is the conf_matrix I have got -

``````Class 0
TP 819.0, TN 602.0, FP 1691.0, FN 181.0
Sensitivity = 0.8190000057220459
Specificity = 0.2625381648540497
Class 1
TP 1.0, TN 2593.0, FP 1.0, FN 698.0
Sensitivity = 0.0014306152006611228
Specificity = 0.9996144771575928
Class 2
TP 124.0, TN 2126.0, FP 373.0, FN 670.0
Sensitivity = 0.1561712920665741
Specificity = 0.8507403135299683
Class 3
TP 118.0, TN 2327.0, FP 166.0, FN 682.0
Sensitivity = 0.14749999344348907
Specificity = 0.933413565158844
Class 0
TP 827.0, TN 609.0, FP 1708.0, FN 181.0
Sensitivity = 0.8204365372657776
Specificity = 0.2628398835659027
Class 1
TP 1.0, TN 2617.0, FP 1.0, FN 706.0
Sensitivity = 0.0014144271844998002
Specificity = 0.9996180534362793
Class 2
TP 124.0, TN 2150.0, FP 373.0, FN 678.0
Sensitivity = 0.1546134650707245
Specificity = 0.8521601557731628
Class 3
TP 124.0, TN 2350.0, FP 167.0, FN 684.0
Sensitivity = 0.15346534550189972
Specificity = 0.9336511492729187
Epoch: 1/30
Train loss: 1.3624..
Valid loss: 1.3209..
Accuracy: 0.4375..
Class 0
TP 1696.0, TN 1853.0, FP 2757.0, FN 312.0
Sensitivity = 0.8446215391159058
Specificity = 0.40195226669311523
Class 1
TP 86.0, TN 5140.0, FP 72.0, FN 1320.0
Sensitivity = 0.0611664280295372
Specificity = 0.9861857295036316
Class 2
TP 245.0, TN 4481.0, FP 541.0, FN 1351.0
Sensitivity = 0.1535087674856186
Specificity = 0.892274022102356
Class 3
TP 605.0, TN 4394.0, FP 616.0, FN 1003.0
Sensitivity = 0.3762437701225281
Specificity = 0.8770459294319153
Class 0
TP 1704.0, TN 1871.0, FP 2763.0, FN 312.0
Sensitivity = 0.8452380895614624
Specificity = 0.403754860162735
Class 1
TP 87.0, TN 5164.0, FP 72.0, FN 1327.0
Sensitivity = 0.06152758002281189
Specificity = 0.9862490296363831
Class 2
TP 248.0, TN 4505.0, FP 541.0, FN 1356.0
Sensitivity = 0.1546134650707245
Specificity = 0.8927863836288452
Class 3
TP 613.0, TN 4412.0, FP 622.0, FN 1003.0
Sensitivity = 0.37933167815208435
Specificity = 0.8764402270317078
Epoch: 2/30
Train loss: 1.2294..
Valid loss: 0.9779..
Accuracy: 0.6250..
Class 0
TP 2490.0, TN 3558.0, FP 3369.0, FN 526.0
Sensitivity = 0.825596809387207
Specificity = 0.5136422514915466
Class 1
TP 278.0, TN 7599.0, FP 231.0, FN 1835.0
Sensitivity = 0.13156649470329285
Specificity = 0.9704980850219727
Class 2
TP 473.0, TN 6738.0, FP 807.0, FN 1925.0
Sensitivity = 0.197247713804245
Specificity = 0.893041729927063
Class 3
TP 1214.0, TN 6446.0, FP 1081.0, FN 1202.0
Sensitivity = 0.5024834275245667
Specificity = 0.8563836812973022
Class 0
TP 2497.0, TN 3575.0, FP 3376.0, FN 527.0
Sensitivity = 0.8257275223731995
Specificity = 0.5143144726753235
Class 1
TP 283.0, TN 7621.0, FP 233.0, FN 1838.0
Sensitivity = 0.13342763483524323
Specificity = 0.9703335762023926
Class 2
TP 474.0, TN 6760.0, FP 809.0, FN 1932.0
Sensitivity = 0.19700747728347778
Specificity = 0.8931166529655457
Class 3
TP 1220.0, TN 6468.0, FP 1083.0, FN 1204.0
Sensitivity = 0.5033003091812134
Specificity = 0.8565753102302551
Epoch: 3/30
Train loss: 1.0757..
Valid loss: 0.9976..
Accuracy: 0.5938..
Class 0
TP 3284.0, TN 5414.0, FP 3830.0, FN 740.0
Sensitivity = 0.8161033987998962
Specificity = 0.5856772065162659
Class 1
TP 572.0, TN 10012.0, FP 436.0, FN 2248.0
Sensitivity = 0.20283688604831696
Specificity = 0.9582695364952087
Class 2
TP 780.0, TN 8991.0, FP 1077.0, FN 2420.0
Sensitivity = 0.24375000596046448
Specificity = 0.8930274248123169
Class 3
TP 1802.0, TN 8557.0, FP 1487.0, FN 1422.0
Sensitivity = 0.5589330196380615
Specificity = 0.8519514203071594
Class 0
TP 3291.0, TN 5432.0, FP 3836.0, FN 741.0
Sensitivity = 0.816220223903656
Specificity = 0.5861027240753174
Class 1
TP 577.0, TN 10031.0, FP 441.0, FN 2251.0
Sensitivity = 0.20403112471103668
Specificity = 0.9578877091407776
Class 2
TP 782.0, TN 9015.0, FP 1077.0, FN 2426.0
Sensitivity = 0.24376559257507324
Specificity = 0.8932818174362183
Class 3
TP 1808.0, TN 8580.0, FP 1488.0, FN 1424.0
Sensitivity = 0.5594059228897095
Specificity = 0.8522049784660339
Epoch: 4/30
Train loss: 0.9810..
Valid loss: 0.9842..
Accuracy: 0.6250..
Class 0
TP 4080.0, TN 7306.0, FP 4255.0, FN 952.0
Sensitivity = 0.8108108043670654
Specificity = 0.6319522261619568
Class 1
TP 902.0, TN 12416.0, FP 650.0, FN 2625.0
Sensitivity = 0.2557414174079895
Specificity = 0.9502525925636292
Class 2
TP 1129.0, TN 11243.0, FP 1348.0, FN 2873.0
Sensitivity = 0.28210893273353577
Specificity = 0.8929393887519836
Class 3
TP 2389.0, TN 10721.0, FP 1840.0, FN 1643.0
Sensitivity = 0.592509925365448
Specificity = 0.8535148501396179
Class 0
TP 4088.0, TN 7327.0, FP 4258.0, FN 952.0
Sensitivity = 0.8111110925674438
Specificity = 0.6324557662010193
Class 1
TP 910.0, TN 12435.0, FP 655.0, FN 2625.0
Sensitivity = 0.2574257552623749
Specificity = 0.94996178150177
Class 2
TP 1132.0, TN 11267.0, FP 1348.0, FN 2878.0
Sensitivity = 0.28229427337646484
Specificity = 0.8931430578231812
Class 3
TP 2394.0, TN 10745.0, FP 1840.0, FN 1646.0
Sensitivity = 0.5925742387771606
Specificity = 0.8537942171096802
Epoch: 5/30
Train loss: 0.9155..
Valid loss: 0.7378..
Accuracy: 0.7500..
Class 0
TP 4849.0, TN 9283.0, FP 4595.0, FN 1191.0
Sensitivity = 0.8028145432472229
Specificity = 0.6689004302024841
Class 1
TP 1278.0, TN 14812.0, FP 872.0, FN 2956.0
Sensitivity = 0.30184224247932434
Specificity = 0.9444019198417664
Class 2
TP 1534.0, TN 13496.0, FP 1618.0, FN 3270.0
Sensitivity = 0.3193172216415405
Specificity = 0.8929469585418701
Class 3
TP 2978.0, TN 12884.0, FP 2194.0, FN 1862.0
Sensitivity = 0.6152892708778381
Specificity = 0.8544899821281433
Class 0
TP 4851.0, TN 9306.0, FP 4596.0, FN 1197.0
Sensitivity = 0.8020833134651184
Specificity = 0.6694000959396362
Class 1
TP 1281.0, TN 14825.0, FP 883.0, FN 2961.0
Sensitivity = 0.301980197429657
Specificity = 0.94378662109375
Class 2
TP 1536.0, TN 13517.0, FP 1621.0, FN 3276.0
Sensitivity = 0.3192020058631897
Specificity = 0.8929184675216675
Class 3
TP 2983.0, TN 12903.0, FP 2199.0, FN 1865.0
Sensitivity = 0.6153053045272827
Specificity = 0.8543901443481445
Epoch: 6/30
Train loss: 0.8958..
Valid loss: 1.1498..
Accuracy: 0.3750..
Class 0
TP 5618.0, TN 11267.0, FP 4928.0, FN 1430.0
Sensitivity = 0.7971055507659912
Specificity = 0.6957085728645325
Class 1
TP 1651.0, TN 17184.0, FP 1118.0, FN 3290.0
Sensitivity = 0.33414289355278015
Specificity = 0.9389137625694275
Class 2
TP 1945.0, TN 15749.0, FP 1888.0, FN 3661.0
Sensitivity = 0.3469496965408325
Specificity = 0.8929523229598999
Class 3
TP 3563.0, TN 15063.0, FP 2532.0, FN 2085.0
Sensitivity = 0.6308428049087524
Specificity = 0.8560954928398132
Class 0
TP 5624.0, TN 11280.0, FP 4939.0, FN 1432.0
Sensitivity = 0.7970521450042725
Specificity = 0.6954805850982666
Class 1
TP 1655.0, TN 17206.0, FP 1120.0, FN 3294.0
Sensitivity = 0.33441099524497986
Specificity = 0.9388846158981323
Class 2
TP 1946.0, TN 15773.0, FP 1888.0, FN 3668.0
Sensitivity = 0.3466334044933319
Specificity = 0.8930977582931519
Class 3
TP 3570.0, TN 15086.0, FP 2533.0, FN 2086.0
Sensitivity = 0.6311880946159363
Specificity = 0.8562347292900085
Epoch: 7/30
Train loss: 0.8790..
Valid loss: 1.1372..
Accuracy: 0.5625..
Class 0
TP 6398.0, TN 13242.0, FP 5270.0, FN 1658.0
Sensitivity = 0.7941906452178955
Specificity = 0.715319812297821
Class 1
TP 2054.0, TN 19579.0, FP 1341.0, FN 3594.0
Sensitivity = 0.3636685609817505
Specificity = 0.9358986616134644
Class 2
TP 2364.0, TN 18020.0, FP 2140.0, FN 4044.0
Sensitivity = 0.368913859128952
Specificity = 0.8938491940498352
Class 3
TP 4153.0, TN 17264.0, FP 2848.0, FN 2303.0
Sensitivity = 0.6432775855064392
Specificity = 0.8583930134773254
Class 0
TP 6405.0, TN 13258.0, FP 5278.0, FN 1659.0
Sensitivity = 0.7942708134651184
Specificity = 0.7152568101882935
Class 1
TP 2061.0, TN 19599.0, FP 1345.0, FN 3595.0
Sensitivity = 0.3643918037414551
Specificity = 0.9357811212539673
Class 2
TP 2366.0, TN 18044.0, FP 2140.0, FN 4050.0
Sensitivity = 0.36876559257507324
Specificity = 0.8939754366874695
Class 3
TP 4157.0, TN 17288.0, FP 2848.0, FN 2307.0
Sensitivity = 0.6431002616882324
Specificity = 0.8585617542266846
Epoch: 8/30
Train loss: 0.8523..
Valid loss: 1.0555..
Accuracy: 0.6250..
Class 0
TP 7177.0, TN 15244.0, FP 5585.0, FN 1887.0
Sensitivity = 0.7918137907981873
Specificity = 0.7318642139434814
Class 1
TP 2460.0, TN 21964.0, FP 1574.0, FN 3895.0
Sensitivity = 0.3870967626571655
Specificity = 0.9331294298171997
Class 2
TP 2814.0, TN 20297.0, FP 2386.0, FN 4396.0
Sensitivity = 0.3902912735939026
Specificity = 0.8948110938072205
Class 3
TP 4746.0, TN 19478.0, FP 3151.0, FN 2518.0
Sensitivity = 0.653359055519104
Specificity = 0.8607538938522339
Class 0
TP 7185.0, TN 15255.0, FP 5598.0, FN 1887.0
Sensitivity = 0.7919973731040955
Specificity = 0.7315494418144226
Class 1
TP 2462.0, TN 21987.0, FP 1575.0, FN 3901.0
Sensitivity = 0.3869244158267975
Specificity = 0.9331550598144531
Class 2
TP 2817.0, TN 20321.0, FP 2386.0, FN 4401.0
Sensitivity = 0.390274316072464
Specificity = 0.8949222564697266
Class 3
TP 4749.0, TN 19500.0, FP 3153.0, FN 2523.0
Sensitivity = 0.653052806854248
Specificity = 0.8608131408691406
Epoch: 9/30
Train loss: 0.8113..
Valid loss: 1.1657..
Accuracy: 0.5000..
Class 0
TP 7949.0, TN 17243.0, FP 5903.0, FN 2123.0
Sensitivity = 0.7892176508903503
Specificity = 0.7449667453765869
Class 1
TP 2864.0, TN 24376.0, FP 1780.0, FN 4198.0
Sensitivity = 0.40555083751678467
Specificity = 0.9319467544555664
Class 2
TP 3262.0, TN 22549.0, FP 2657.0, FN 4750.0
Sensitivity = 0.40713930130004883
Specificity = 0.8945885896682739
Class 3
TP 5360.0, TN 21703.0, FP 3443.0, FN 2712.0
Sensitivity = 0.664023756980896
Specificity = 0.8630796074867249
Class 0
TP 7955.0, TN 17262.0, FP 5908.0, FN 2125.0
Sensitivity = 0.7891865372657776
Specificity = 0.7450150847434998
Class 1
TP 2871.0, TN 24394.0, FP 1786.0, FN 4199.0
Sensitivity = 0.40608203411102295
Specificity = 0.9317799806594849
Class 2
TP 3264.0, TN 22571.0, FP 2659.0, FN 4756.0
Sensitivity = 0.40698254108428955
Specificity = 0.8946095705032349
Class 3
TP 5364.0, TN 21727.0, FP 3443.0, FN 2716.0
Sensitivity = 0.6638613939285278
Specificity = 0.863210141658783
Epoch: 10/30
Train loss: 0.8250..
Valid loss: 0.9164..
Accuracy: 0.5938..
``````

I just wanna know why this TP, TN, FP, FN is 819, 602, 1691, 181(for the first example)? if the true negative is 602, then what about other examples? Like I have 1000 images in training class 0, means first class. Where this 1691 comes from? I am pretty much confused now.

Heey, I have a bit of a tough time of understanding you. Please answer the following

• How many classes do you have?
• Do you want to pair up one image to one class? Otherwise, explain what you want to do
• How many images do you have in total? How many of them are currently used as training images?
• What do you mean with multi-class conf_matrix? Edit: Ah I see now, all the medium articles are for binary problems. Here is one example for multi class

If possible, I’d use a graph plotting tool like plotly or matplotlib to plot the confusion matrix as heatmaps. You can use the sklearn implementation of a confusion matrix to get this to work quite easily

Well,
First, I have four classes.
Second, So, I really don’t know what you mean by One image to one class. But what I am trying to make is a Classifier for each class to classify images. Here I have four folders and in four training folders, I have 1000 Images each.
Third, I have near 10,000 images in each class for the train set and there is 8 images for valid and 250 in the test set for each class/folder. Currently, I am using 1000 pictures from each train set folder, because uploading even 5,000 images in the Google Drive is hectic.
Fourth, This is what I mean by Multi class Confusion matrix. well, my mistake, I wanna print the ROC plot, so I used this confusion matrix to get the sensitivity and specificity. Now I am confused with the matrix itself.

How can you have 1000 images in each folder AND have 10 000 images in each class, and 8 images for valid and 250 in test for each class. I dont understand this

The one image to one class means that each image corresponds to one class. An image can be an image of a cat or an image of a dog.

For multi-label classification problems, an image can be a cat image and a dog image. That’s the stuff your link talk about

So, I have downloaded a file that has all the images. I am doing my coding in Colab, so you know I have to upload the pictures in GDrive. I have 10,000 images of each class of 4 classes, means total 40,000 images, right? Now I have created four folders, as we are calling this classes in GDrive and uploaded 1000 images in each class as train set. Got it? Then uploaded 8 images in valid sets. Now doing this multi class stuff. Got it now bro?

Yes I believe that I understand your data now. To answer the questions below

You can’t just think about one class in isolation here. You have to consider them all. The reason you have 1691 at FP is that out of all the images that is shown to it, 1691 of them were incorrectly classified as class 0. There were a lot of rightly classified images as class 0 (TP=819).

That’s it, im out. Good luck bro

1 Like

Its best to split your dataset into 60-20-20 (as train-validation-test). Use the 20% validation for early stopping and choosing the right learning rate. Once you have the best model - use the test 20% to compute the final Precision - Recall - F1 scores.

One way to choose the right learning rate - start high - and gradually decrease if your loss doesn’t decrease after a certain epoch. Also you need to play with your other hyper-parameters like momentum.

So, as I have almost 10000 train data-sets and I have taken only used 1000 for uploading into GDrive. So, can I use 200-250 from the rest 9000 train datasets for each class?
For Learning rate, As you told I think I can start with 0.1 and then decrease(like 0.01, 0.001), along with the loss. Right?
Next, I have calculated Sensitivity, which is called recall and Specificity. Do I need to get the Precision. I need to plot the ROC of the those values.
Also, I do not know about momentum, I have not done that course yet, but about to.