Skip to content

Commit 4c0d903

Browse files
Fix some bandit findings
Everything in the middle serverity level and higher: * Adds a timeout to the SPOT call (30 seconds) * Use python's `tempfile` to make the temporary PDF written to * even though we don't read from the file, we don't want to write to a different / injected file.
1 parent 08ba6a0 commit 4c0d903

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

formfyxer/lit_explorer.py

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,7 @@ def spot(
235235
"https://spot.suffolklitlab.org/v0/entities-nested/",
236236
headers=headers,
237237
data=json.dumps(body),
238+
timeout=30,
238239
)
239240
output_ = r.json()
240241
try:
@@ -1630,16 +1631,17 @@ def parse_form(
16301631
or readability > 30
16311632
):
16321633
# We do not care what the PDF output is, doesn't add that much time
1633-
ocr_p = [
1634-
"ocrmypdf",
1635-
"--force-ocr",
1636-
"--rotate-pages",
1637-
"--sidecar",
1638-
"-",
1639-
in_file,
1640-
"/tmp/test.pdf",
1641-
]
1642-
process = subprocess.run(ocr_p, timeout=60, check=False, capture_output=True)
1634+
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=True) as temp_pdf:
1635+
ocr_p = [
1636+
"ocrmypdf",
1637+
"--force-ocr",
1638+
"--rotate-pages",
1639+
"--sidecar",
1640+
"-",
1641+
in_file,
1642+
temp_pdf.name,
1643+
]
1644+
process = subprocess.run(ocr_p, timeout=60, check=False, capture_output=True)
16431645
if process.returncode == 0:
16441646
original_text = process.stdout.decode()
16451647
text = cleanup_text(original_text)

0 commit comments

Comments
 (0)