-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
enhancementNew feature or requestNew feature or request
Description
PDFs also support editing markup such as highlighting, strikethroughs, and text insertion. These could also be captured and would be useful, for example, when providing lists of corrections for large documents such as PhD theses.
(the code below assumes that the offset suggested in #5 is also implemented)
SEVERITY_NAMES = {0: "Minor comments", 1: "Major comments", 2: "Edits"}
# Edits to be reported
EDITS = ("Cross-Out", "Inserted Text")
def iter_edit_contents(page: PageObject) -> Iterator[str]:
try:
edit_indirects = page["/Annots"]
except KeyError:
return
for edit_indirect in edit_indirects:
edit = edit_indirect.getObject()
try:
if edit["/Subj"] in EDITS:
if edit["/Subj"] == "Cross-Out":
yield edit["/Subj"], "-"
else:
yield edit["/Subj"], edit["/Contents"]
except KeyError:
continue
def load_comments(filename: str, offset: int) -> SeverityDict:
res: SeverityDict = defaultdict(list)
reader = PdfFileReader(filename, STRICT)
for page_num, page in enumerate(reader.pages, 1):
for contents in iter_annot_contents(page):
m_stars = re_stars.match(contents)
assert m_stars is not None # should always match
stars = m_stars["stars"]
comment = m_stars["comment"]
# number of stars
severity = len(stars)
res[severity].append(f"p{page_num - offset}: {comment}")
for edit_type, edit in iter_edit_contents(page):
res[2].append(f"p{page_num - offset}: {edit_type} ({edit})")
return resMetadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request