Have a look at this thread. FairSeq is using a different approaches in case they run into an OOM issue.
Maybe you could adapt them to your use case.
              
              
              1 Like
            
            
          Have a look at this thread. FairSeq is using a different approaches in case they run into an OOM issue.
Maybe you could adapt them to your use case.