Add support for editing markup in PDFs

PDFs also support editing markup such as highlighting, strikethroughs, and text insertion. These could also be captured and would be useful, for example, when providing lists of corrections for large documents such as PhD theses.

(the code below assumes that the offset suggested in #5 is also implemented)

```python
SEVERITY_NAMES = {0: "Minor comments", 1: "Major comments", 2: "Edits"}

# Edits to be reported
EDITS = ("Cross-Out", "Inserted Text")

def iter_edit_contents(page: PageObject) -> Iterator[str]:
    try:
        edit_indirects = page["/Annots"]
    except KeyError:
        return

    for edit_indirect in edit_indirects:
        edit = edit_indirect.getObject()

        try:
            if edit["/Subj"] in EDITS:
                if edit["/Subj"] == "Cross-Out":
                    yield edit["/Subj"], "-"
                else:
                    yield edit["/Subj"], edit["/Contents"]
        except KeyError:
            continue


def load_comments(filename: str, offset: int) -> SeverityDict:
    res: SeverityDict = defaultdict(list)

    reader = PdfFileReader(filename, STRICT)
    for page_num, page in enumerate(reader.pages, 1):
        for contents in iter_annot_contents(page):
            m_stars = re_stars.match(contents)
            assert m_stars is not None  # should always match

            stars = m_stars["stars"]
            comment = m_stars["comment"]

            # number of stars
            severity = len(stars)

            res[severity].append(f"p{page_num - offset}: {comment}")
        for edit_type, edit in iter_edit_contents(page):
            res[2].append(f"p{page_num - offset}: {edit_type} ({edit})")

    return res
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for editing markup in PDFs #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for editing markup in PDFs #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions