Run inference script crashes

Hi all, 

I have tried to run the Pytorch version after I initially tried with the Tensorflow version. I tried to run the inference script in wsi mode with a ndpi image. It start correct but mid-way through the process I got this error: 

```
Process Chunk 48/99:  61%|#############5        | 35/57 [02:19<01:11,  3.23s/it]|2021-01-06|13:06:15.182| [ERROR] Crash
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/local/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/local/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
    return recvfds(s, 1)[0]
  File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 803, in _try_get_data
    fs = [tempfile.NamedTemporaryFile() for i in range(fds_limit_margin)]
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 803, in <listcomp>
    fs = [tempfile.NamedTemporaryFile() for i in range(fds_limit_margin)]
  File "/usr/local/lib/python3.7/tempfile.py", line 547, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/usr/local/lib/python3.7/tempfile.py", line 258, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
OSError: [Errno 24] Too many open files: '/tmp/tmpxrmts9vn'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 746, in process_wsi_list
    self.process_single_file(wsi_path, msk_path, self.output_dir)
  File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 550, in process_single_file
    self.__get_raw_prediction(chunk_info_list, patch_info_list)
  File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 374, in __get_raw_prediction
    chunk_patch_info_list[:, 0, 0], pbar_desc
  File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 287, in __run_model
    for batch_idx, batch_data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data
    idx, data = self._get_data()
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data
    success, data = self._try_get_data()
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 807, in _try_get_data
    "Too many open files. Communication with the"
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code
Process Chunk 48/99:  61%|#############5        | 35/57 [02:19<01:27,  4.00s/it]
/usr/local/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))
```

Do you know why this error might occur? 

Running on an Ubuntu 20 machine that has a conda env with the requirements. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run inference script crashes #79

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Run inference script crashes #79

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions