EDIT: [Before reading this].
- I replaced F.interpolate by a
constant
matrix of ones. This leads to a100%
deterministic behavior. - The documentation indicates that all functionals that upsample/interpolate tensors may lead to non-deterministic results.
torch.nn.functional.
interpolate
( input , size=None , scale_factor=None , mode=‘nearest’ , align_corners=None ):
…
Note: When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on Reproducibility for background.
- The above note was mentioned only in
torch.nn.functional.*
(upsample, interpolate functions). This suggests thattorch.nn.*
(upsample) may be deterministic. I tired it, and it is non-deterministic as well.
Is there a way to turn off the non-deterministic behavior of upsample/interpolate functionals/classes?
Hi,
I am facing a sort of accumulated
errors/randomness during training iterations that leads to a large non-determinism after many iterations.
Sorry for the repetition, the subject of determinism in Pytorch has been raised many times.
I use the following configuration:
- Pytorch version: 1.0.0
- Device: CUDA (version: 10)
- Optimizer: SGD + Momentum.
- Batch size: 1
- Type of model: Fully convolution network.
- Type of layers: Convolution (some of them with padding), max-pooling, another spatial pooling, batch-norm2d, 2d interpolation.
- Activation: Relu, Sigmoid.
- Weight decay: Used.
- Number of workers: 0
- Shuffle training: True
- Reproducibility config. (before running anything):
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
Now,
- Where the accumulated non-determinism may come from?
- Is F.interpolate sensitive to infinitesimal variations?
Training for many epochs leads to extremely different models, every time. This makes the model extremely unstable. Hyper-parameter search is hopeless.
I put down the output of the training for one epoch under the following form (2 lines for each mini-batch) for two runs under the same device CUDA=1
:
- Line 1:
Dadaloader Index 58 i 394 j 861
: This comes from the dataloader. From the sample index (58), we crop randomly a at the position (i=394, j=861). - Line 2:
1.3078937530517578 0.6137349605560303 0.6941587924957275 0.0
contains four losses (measured just right before the calling oftotal_loss.backward()
function):
2.1 Total loss: 1.3078937530517578 (sum of loss 1, 2, 3)
2.2 Loss 1: 0.6137349605560303 (cross entropy)
2.3 Loss 2: 0.6941587924957275 (another cross entropy)
2.4 Loss 3: 0.0 (some loss. Not used for now. Set to zero manually.)
As you see below, with identical samples, only the first forward is exactly the same (up to the displayed precision). Then, the non-determinism starts slowly until it reaches the two first digits after decimal point.
Let me know if you need more info.
Thank you!
Run 1:
Dadaloader Index 58 i 394 j 861
1.3078937530517578 0.6137349605560303 0.6941587924957275 0.0
Dadaloader Index 17 i 516 j 285
1.4302682876586914 0.7349525094032288 0.6953157186508179 0.0
Dadaloader Index 40 i 350 j 338
1.3341482877731323 0.6400192975997925 0.6941289901733398 0.0
Dadaloader Index 39 i 63 j 28
1.313957691192627 0.6189037561416626 0.6950539350509644 0.0
Dadaloader Index 34 i 333 j 128
1.246232032775879 0.5525572299957275 0.6936748027801514 0.0
Dadaloader Index 66 i 501 j 223
1.2932825088500977 0.5938845872879028 0.6993979215621948 0.0
Dadaloader Index 33 i 394 j 649
1.4306211471557617 0.7335571050643921 0.6970640420913696 0.0
Dadaloader Index 28 i 153 j 79
1.2893931865692139 0.5962274670600891 0.69316565990448 0.0
Dadaloader Index 43 i 220 j 858
1.3622307777404785 0.6630300283432007 0.6992007493972778 0.0
Dadaloader Index 2 i 343 j 389
1.432244896888733 0.7384765148162842 0.6937683820724487 0.0
Dadaloader Index 4 i 149 j 448
1.6827927827835083 0.9871603846549988 0.6956323981285095 0.0
Dadaloader Index 6 i 122 j 801
1.2698733806610107 0.5732445120811462 0.6966289281845093 0.0
Dadaloader Index 54 i 62 j 46
1.597459077835083 0.9041099548339844 0.6933490633964539 0.0
Dadaloader Index 53 i 515 j 956
1.4035171270370483 0.7025971412658691 0.7009199857711792 0.0
Dadaloader Index 35 i 173 j 416
1.1874909400939941 0.4915717840194702 0.6959190964698792 0.0
Dadaloader Index 16 i 179 j 27
1.4753220081329346 0.7813417315483093 0.69398033618927 0.0
Dadaloader Index 37 i 157 j 25
1.438446044921875 0.7451226115226746 0.6933234930038452 0.0
Dadaloader Index 59 i 367 j 271
1.4589791297912598 0.765472948551178 0.693506121635437 0.0
Dadaloader Index 55 i 228 j 917
1.357609748840332 0.6637986898422241 0.6938110589981079 0.0
Dadaloader Index 41 i 169 j 912
1.4523104429244995 0.7468318939208984 0.7054785490036011 0.0
Dadaloader Index 52 i 51 j 850
1.2322102785110474 0.5377039909362793 0.6945062875747681 0.0
Dadaloader Index 42 i 3 j 437
1.6166377067565918 0.9193493127822876 0.697288453578949 0.0
Dadaloader Index 24 i 309 j 573
1.2319157123565674 0.532400369644165 0.6995154023170471 0.0
Dadaloader Index 44 i 118 j 525
1.5829644203186035 0.8816967010498047 0.701267659664154 0.0
Dadaloader Index 21 i 265 j 644
1.3201860189437866 0.6264278888702393 0.6937581300735474 0.0
Dadaloader Index 13 i 550 j 365
1.3417432308197021 0.640053391456604 0.7016898989677429 0.0
Dadaloader Index 56 i 36 j 84
1.3279290199279785 0.6328123807907104 0.6951166391372681 0.0
Dadaloader Index 26 i 49 j 630
1.4093023538589478 0.7159905433654785 0.6933118104934692 0.0
Dadaloader Index 32 i 492 j 970
1.3224120140075684 0.6276995539665222 0.6947124004364014 0.0
Dadaloader Index 65 i 159 j 341
1.4683253765106201 0.7739512920379639 0.6943740844726562 0.0
Dadaloader Index 38 i 4 j 100
1.4151089191436768 0.7180051803588867 0.69710373878479 0.0
Dadaloader Index 46 i 267 j 341
1.4146380424499512 0.7189313173294067 0.6957067251205444 0.0
Dadaloader Index 5 i 509 j 815
1.4373635053634644 0.7409232258796692 0.6964402794837952 0.0
Dadaloader Index 51 i 154 j 214
1.4572081565856934 0.7623822689056396 0.6948258876800537 0.0
Dadaloader Index 0 i 510 j 1021
1.4267393350601196 0.7281390428543091 0.6986002922058105 0.0
Dadaloader Index 36 i 151 j 526
1.2620775699615479 0.5684042572975159 0.693673312664032 0.0
Dadaloader Index 19 i 67 j 460
1.3120753765106201 0.6127147674560547 0.6993606686592102 0.0
Dadaloader Index 30 i 542 j 1028
1.3695738315582275 0.6763384938240051 0.6932352781295776 0.0
Dadaloader Index 48 i 548 j 855
1.3978221416473389 0.7043324112892151 0.6934897899627686 0.0
Dadaloader Index 25 i 260 j 681
1.3923063278198242 0.6989061236381531 0.6934001445770264 0.0
Dadaloader Index 57 i 130 j 489
1.456763505935669 0.7636163830757141 0.6931471824645996 0.0
Dadaloader Index 23 i 83 j 310
1.4143562316894531 0.721157431602478 0.6931988000869751 0.0
Dadaloader Index 15 i 33 j 271
1.3383533954620361 0.6422185897827148 0.6961347460746765 0.0
Dadaloader Index 62 i 478 j 691
1.3755115270614624 0.6822086572647095 0.6933028697967529 0.0
Dadaloader Index 22 i 348 j 124
1.3877379894256592 0.6908612251281738 0.6968767642974854 0.0
Dadaloader Index 12 i 77 j 91
1.5546791553497314 0.8563271760940552 0.6983519792556763 0.0
Dadaloader Index 49 i 514 j 860
1.561444878578186 0.8678901195526123 0.6935547590255737 0.0
Dadaloader Index 27 i 159 j 62
1.2833820581436157 0.5873216390609741 0.6960604190826416 0.0
Dadaloader Index 63 i 367 j 60
1.3991279602050781 0.7058144807815552 0.693313479423523 0.0
Dadaloader Index 7 i 126 j 14
1.3928130865097046 0.6996296644210815 0.693183422088623 0.0
Dadaloader Index 11 i 264 j 266
1.3304831981658936 0.6312242150306702 0.6992589235305786 0.0
Dadaloader Index 14 i 105 j 630
1.443231463432312 0.7446146011352539 0.6986168622970581 0.0
Dadaloader Index 61 i 381 j 859
1.581858515739441 0.8874715566635132 0.6943869590759277 0.0
Dadaloader Index 18 i 145 j 864
1.3805955648422241 0.6874476671218872 0.6931478977203369 0.0
Dadaloader Index 8 i 328 j 903
1.4571764469146729 0.7447873950004578 0.7123889923095703 0.0
Dadaloader Index 47 i 407 j 456
1.3445788621902466 0.6445275545120239 0.7000513076782227 0.0
Dadaloader Index 3 i 88 j 989
1.5132746696472168 0.812066376209259 0.7012083530426025 0.0
Dadaloader Index 31 i 308 j 319
1.3421146869659424 0.6405842304229736 0.701530396938324 0.0
Dadaloader Index 9 i 123 j 553
1.3412201404571533 0.6383508443832397 0.7028692960739136 0.0
Dadaloader Index 10 i 200 j 174
1.3003937005996704 0.6061668395996094 0.694226861000061 0.0
Dadaloader Index 50 i 532 j 275
1.3342442512512207 0.6409924030303955 0.6932517886161804 0.0
Dadaloader Index 64 i 141 j 321
1.3601045608520508 0.6662247180938721 0.6938799023628235 0.0
Dadaloader Index 60 i 500 j 1027
1.535063624382019 0.8380688428878784 0.6969947814941406 0.0
Dadaloader Index 1 i 10 j 136
1.3268404006958008 0.6336882710456848 0.6931520700454712 0.0
Dadaloader Index 20 i 276 j 599
1.2224817276000977 0.5285671353340149 0.6939146518707275 0.0
Dadaloader Index 45 i 338 j 863
1.341081142425537 0.6468881368637085 0.6941929459571838 0.0
Dadaloader Index 29 i 454 j 715
1.5284583568572998 0.8332671523094177 0.6951912641525269 0.0
Run 2:
Dadaloader Index 58 i 394 j 861
1.3078937530517578 0.6137349605560303 0.6941587924957275 0.0
Dadaloader Index 17 i 516 j 285
1.4302685260772705 0.7349528670310974 0.6953156590461731 0.0
Dadaloader Index 40 i 350 j 338
1.3341515064239502 0.6400255560874939 0.6941258907318115 0.0
Dadaloader Index 39 i 63 j 28
1.3138806819915771 0.6188313961029053 0.6950492858886719 0.0
Dadaloader Index 34 i 333 j 128
1.2462828159332275 0.552611768245697 0.6936709880828857 0.0
Dadaloader Index 66 i 501 j 223
1.2930963039398193 0.5936810970306396 0.6994152665138245 0.0
Dadaloader Index 33 i 394 j 649
1.4302959442138672 0.7332066893577576 0.6970892548561096 0.0
Dadaloader Index 28 i 153 j 79
1.2886989116668701 0.5955342054367065 0.6931647062301636 0.0
Dadaloader Index 43 i 220 j 858
1.3621113300323486 0.6629502177238464 0.699161171913147 0.0
Dadaloader Index 2 i 343 j 389
1.4280494451522827 0.7342602014541626 0.6937892436981201 0.0
Dadaloader Index 4 i 149 j 448
1.6840848922729492 0.9882199764251709 0.6958649754524231 0.0
Dadaloader Index 6 i 122 j 801
1.269580364227295 0.5725433826446533 0.6970369815826416 0.0
Dadaloader Index 54 i 62 j 46
1.5900547504425049 0.8966010808944702 0.6934536099433899 0.0
Dadaloader Index 53 i 515 j 956
1.4087860584259033 0.7076217532157898 0.7011643648147583 0.0
Dadaloader Index 35 i 173 j 416
1.1813137531280518 0.4848504066467285 0.696463406085968 0.0
Dadaloader Index 16 i 179 j 27
1.4908199310302734 0.7964862585067749 0.6943336725234985 0.0
Dadaloader Index 37 i 157 j 25
1.430418610572815 0.7366296052932739 0.693789005279541 0.0
Dadaloader Index 59 i 367 j 271
1.455644130706787 0.7620043754577637 0.6936398148536682 0.0
Dadaloader Index 55 i 228 j 917
1.3617483377456665 0.6683894395828247 0.6933588981628418 0.0
Dadaloader Index 41 i 169 j 912
1.44582998752594 0.7392634153366089 0.706566572189331 0.0
Dadaloader Index 52 i 51 j 850
1.2269784212112427 0.5328167676925659 0.6941616535186768 0.0
Dadaloader Index 42 i 3 j 437
1.6447217464447021 0.9446018934249878 0.7001197934150696 0.0
Dadaloader Index 24 i 309 j 573
1.2213356494903564 0.5238724946975708 0.6974631547927856 0.0
Dadaloader Index 44 i 118 j 525
1.559009075164795 0.8618038892745972 0.6972051858901978 0.0
Dadaloader Index 21 i 265 j 644
1.3367997407913208 0.6425700783729553 0.6942296624183655 0.0
Dadaloader Index 13 i 550 j 365
1.3158639669418335 0.6140563488006592 0.7018076181411743 0.0
Dadaloader Index 56 i 36 j 84
1.351355791091919 0.6538291573524475 0.6975266933441162 0.0
Dadaloader Index 26 i 49 j 630
1.4205737113952637 0.7271271347999573 0.6934466361999512 0.0
Dadaloader Index 32 i 492 j 970
1.3131057024002075 0.6165637969970703 0.6965419054031372 0.0
Dadaloader Index 65 i 159 j 341
1.4874322414398193 0.7942564487457275 0.6931758522987366 0.0
Dadaloader Index 38 i 4 j 100
1.3745040893554688 0.6714707016944885 0.703033447265625 0.0
Dadaloader Index 46 i 267 j 341
1.4010605812072754 0.7062500715255737 0.6948105096817017 0.0
Dadaloader Index 5 i 509 j 815
1.3979568481445312 0.6986576318740845 0.6992992162704468 0.0
Dadaloader Index 51 i 154 j 214
1.4489946365356445 0.7555997967720032 0.6933948397636414 0.0
Dadaloader Index 0 i 510 j 1021
1.4158250093460083 0.7206200957298279 0.6952049136161804 0.0
Dadaloader Index 36 i 151 j 526
1.2976839542388916 0.604112446308136 0.6935714483261108 0.0
Dadaloader Index 19 i 67 j 460
1.2455717325210571 0.5450658798217773 0.7005058526992798 0.0
Dadaloader Index 30 i 542 j 1028
1.3993200063705444 0.7056968212127686 0.6936231851577759 0.0
Dadaloader Index 48 i 548 j 855
1.3980698585510254 0.7036033272743225 0.6944665908813477 0.0
Dadaloader Index 25 i 260 j 681
1.375881314277649 0.6819276809692383 0.6939536333084106 0.0
Dadaloader Index 57 i 130 j 489
1.4764268398284912 0.7831727266311646 0.6932541131973267 0.0
Dadaloader Index 23 i 83 j 310
1.3798577785491943 0.6867095828056335 0.693148136138916 0.0
Dadaloader Index 15 i 33 j 271
1.3060274124145508 0.6084781289100647 0.6975493431091309 0.0
Dadaloader Index 62 i 478 j 691
1.3667536973953247 0.6725990176200867 0.694154679775238 0.0
Dadaloader Index 22 i 348 j 124
1.4308826923370361 0.7318511009216309 0.69903165102005 0.0
Dadaloader Index 12 i 77 j 91
1.4788453578948975 0.7820360660552979 0.6968092918395996 0.0
Dadaloader Index 49 i 514 j 860
1.5342035293579102 0.8410226702690125 0.6931809186935425 0.0
Dadaloader Index 27 i 159 j 62
1.306290626525879 0.6106433868408203 0.6956472396850586 0.0
Dadaloader Index 63 i 367 j 60
1.3206387758255005 0.6242587566375732 0.6963800191879272 0.0
Dadaloader Index 7 i 126 j 14
1.3802521228790283 0.68690025806427 0.6933518648147583 0.0
Dadaloader Index 11 i 264 j 266
1.3990780115127563 0.693006157875061 0.7060718536376953 0.0
Dadaloader Index 14 i 105 j 630
1.4142067432403564 0.7153005599975586 0.6989062428474426 0.0
Dadaloader Index 61 i 381 j 859
1.5843031406402588 0.88603276014328 0.698270320892334 0.0
Dadaloader Index 18 i 145 j 864
1.3847625255584717 0.6911967396736145 0.6935657262802124 0.0
Dadaloader Index 8 i 328 j 903
1.4557201862335205 0.7504081726074219 0.7053120136260986 0.0
Dadaloader Index 47 i 407 j 456
1.2635979652404785 0.5608187913894653 0.7027791738510132 0.0
Dadaloader Index 3 i 88 j 989
1.4313454627990723 0.7353203892707825 0.696025013923645 0.0
Dadaloader Index 31 i 308 j 319
1.3706622123718262 0.6764122843742371 0.6942499876022339 0.0
Dadaloader Index 9 i 123 j 553
1.3362730741500854 0.6407691240310669 0.6955039501190186 0.0
Dadaloader Index 10 i 200 j 174
1.3735915422439575 0.6650934219360352 0.7084981203079224 0.0
Dadaloader Index 50 i 532 j 275
1.3059853315353394 0.6126150488853455 0.6933702826499939 0.0
Dadaloader Index 64 i 141 j 321
1.3388330936431885 0.6456806659698486 0.6931523680686951 0.0
Dadaloader Index 60 i 500 j 1027
1.4855607748031616 0.7867655754089355 0.6987951993942261 0.0
Dadaloader Index 1 i 10 j 136
1.3076568841934204 0.6124106645584106 0.6952462196350098 0.0
Dadaloader Index 20 i 276 j 599
1.2486283779144287 0.5525542497634888 0.6960740685462952 0.0
Dadaloader Index 45 i 338 j 863
1.4487152099609375 0.7551159262657166 0.6935993432998657 0.0
Dadaloader Index 29 i 454 j 715
1.5891010761260986 0.8944867849349976 0.6946142911911011 0.0