Add timeouts to remote downloads#418
Conversation
|
|
||
| def down_model_file(url, save_path): | ||
| resp = requests.get(url, stream=True) | ||
| resp = requests.get(url, stream=True, timeout=REQUEST_TIMEOUT_SECONDS) |
There was a problem hiding this comment.
Self-review: The shared timeout constant keeps model file and zip downloads consistent, and raise_for_status prevents streaming an error page into the model cache.
|
哈罗,邮件收到了,我会尽快查看,回复,谢谢!
|
There was a problem hiding this comment.
PR Summary:
- Adds
timeout=30(via a named constant) to bothrequests.get()calls in the model download helpers. - Adds
raise_for_status()after each request to fail fast on HTTP errors before streaming begins. - Adds
timeout=10tourllib.request.urlopen()in the sample image loader.
Review Summary:
The changes are a solid step forward — bounded timeouts and explicit HTTP error checking are the right call for any network I/O path. Two issues worth addressing: (1) if a mid-stream ReadTimeout fires during down_model_file or down_model_zip, the already-opened output file is left as corrupt partial content on disk, which could cause silent failures on the next run (file exists but is invalid); a try/except that removes the partial file on failure is needed. (2) In url_to_image, the bare except Exception swallows the timeout silently and returns None with no diagnostic — now that timeouts are a real code path, at least a logger.warning is warranted, and urllib.error must be imported explicitly (not covered by import urllib alone).
Suggestions
|
|
||
| def down_model_file(url, save_path): | ||
| resp = requests.get(url, stream=True) | ||
| resp = requests.get(url, stream=True, timeout=REQUEST_TIMEOUT_SECONDS) |
There was a problem hiding this comment.
The timeout parameter in requests.get() sets both the connect timeout and the read timeout between chunks, not a total download time limit. For large model files over slow connections, 30 seconds between chunks is probably fine, but it's worth documenting this distinction — callers might expect a wall-clock cap.
More importantly, if the server accepts the connection and starts sending but stalls mid-transfer (e.g. after the first few chunks), the 30s read timeout will eventually fire and raise requests.exceptions.ReadTimeout, but the partial file at save_path will already be written and left on disk in a corrupt state. Consider wrapping the download in a try/except that cleans up the partial file on failure:
def down_model_file(url, save_path):
resp = requests.get(url, stream=True, timeout=REQUEST_TIMEOUT_SECONDS)
resp.raise_for_status()
total = int(resp.headers.get('content-length', 0))
try:
with open(save_path, 'wb') as file, tqdm(...) as bar:
for data in resp.iter_content(chunk_size=1024):
size = file.write(data)
bar.update(size)
except Exception:
if os.path.exists(save_path):
os.remove(save_path)
raiseSame applies to down_model_zip.
actions
Feedback: Rate this comment to help me improve future code reviews:
- 👍 Good - Helpful and accurate
- 👎 Poor - Wrong, unclear, or unhelpful
- Skip if you don't have any strong opinions either way.
There was a problem hiding this comment.
Thanks, accepted in 8323c2b by removing partially written model files if streaming fails before re-raising the download error.
| resp = urllib.request.urlopen(url, timeout=10) | ||
| image = np.asarray(bytearray(resp.read()), dtype="uint8") | ||
| image = cv2.imdecode(image, cv2.IMREAD_COLOR) | ||
| except Exception as err: |
There was a problem hiding this comment.
The broad except Exception silently swallows the new urllib.error.URLError (raised on timeout) and returns None, giving the caller no indication of why it failed. Now that a timeout is a real code path, at least log a warning:
import urllib.error # add at top of file alongside existing `import urllib`
def url_to_image(url):
try:
resp = urllib.request.urlopen(url, timeout=10)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
except urllib.error.URLError as err:
logger.warning(f"Failed to fetch image from URL ({err}): {url}")
return None
except Exception as err:
logger.warning(f"Unexpected error reading image from URL: {err}")
return None
return imageNote: import urllib alone does not make urllib.error available — it must be imported explicitly.
actions
Feedback: Rate this comment to help me improve future code reviews:
- 👍 Good - Helpful and accurate
- 👎 Poor - Wrong, unclear, or unhelpful
- Skip if you don't have any strong opinions either way.
There was a problem hiding this comment.
Thanks, accepted in 8323c2b by importing urllib.request/urllib.error explicitly and logging URL fetch failures before returning None.
Summary
Why
The download helpers currently wait indefinitely if a remote server stalls. Adding bounded timeouts keeps CLI/model initialization paths from hanging on network sockets, while raise_for_status makes failed downloads explicit before writing partial content.
Testing