since i am not able to adjust the share memory usage in the remote server, can we disable share memory usage in pytorch. the same experiment run with tensorflow without shm size problem, so i just want to find a solution for this problem.
Hi,
I’m not a specialist on shared memory but from what I remember, it is only used if you explicitly send tensors across processes. So reducing these would solve your problem. I don’t think we support other ways to transfer tensors in multiprocessing. I guess you could save and load from disk?
i am using a distributed job with distributeddataparallel , i just do not clearly got what you mean. as for as i know, pytorch use share memory in dataloader ?
@albanD yes , i am sure dataloader with use share memory as default for multiprocessing dataloader.
this is what i found in pytorch source code. torch.utils.data.dataloader.py
97 def _worker_loop(dataset, index_queue, data_queue, done_event, collate_fn, seed, init_fn, worker_id):
98 # See NOTE [ Data Loader Multiprocessing Shutdown Logic ] for details on the
99 # logic of this function.
100
101 try:
102 global _use_shared_memory
103 _use_shared_memory = True
104
105 # Intialize C side signal handlers for SIGBUS and SIGSEGV. Python signal
106 # module’s handlers are executed after Python returns from C low-level
107 # handlers, likely when the same fatal signal happened again already.
108 # https://docs.python.org/3/library/signal.html Sec. 18.8.1.1
109 _set_worker_signal_handlers()
110
111 torch.set_num_threads(1)
112 random.seed(seed)
113 torch.manual_seed(seed)
114
115 data_queue.cancel_join_thread()
116
117 if init_fn is not None:
118 init_fn(worker_id)
119
120 watchdog = ManagerWatchdog()
121
122 while watchdog.is_alive():
123 try:
124 r = index_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
125 except queue.Empty:
126 continue
127 if r is None:
128 # Received the final signal
129 assert done_event.is_set()
130 return
131 elif done_event.is_set():
132 # Done event is set. But I haven’t received the final signal
133 # (None) yet. I will keep continuing until get it, and skip the
134 # processing steps.
135 continue
136 idx, batch_indices = r
137 try:
138 samples = collate_fn([dataset[i] for i in batch_indices])
139 except Exception:
140 # It is important that we don’t store exc_info in a variable,
141 # see NOTE [ Python Traceback Reference Cycle Problem ]
142 data_queue.put((idx, ExceptionWrapper(sys.exc_info())))
143 else:
144 data_queue.put((idx, samples))
145 del samples
146 except KeyboardInterrupt:
147 # Main process will raise KeyboardInterrupt anyways.
148 pass
Does setting the number of workers to 0
for the dataloader fix the error?
i am not sure, but i can not set the number_works to 0, for the data loading is the bottleneck in our video task. we need to set for each gpu at least 8 number_worker to make sure the data loading will not increase the training time a lot .
I meant as a test, to confirm that this is where the error is coming from !
i have test to set number_worker to 1, and the problem disappear. i can also make sure that this problem is related to batch_size , image_size and number_workers at least.
I’m not sure there is any way to perform the sharing between the loading processes and the main one that can replace the shared memory.
You might have to reduce the number of workers if you cannot increase the shared memory.
Maybe @smth has some other ideas to overcome this?