Skip to content

Add support for editing markup in PDFs #6

@widdowquinn

Description

@widdowquinn

PDFs also support editing markup such as highlighting, strikethroughs, and text insertion. These could also be captured and would be useful, for example, when providing lists of corrections for large documents such as PhD theses.

(the code below assumes that the offset suggested in #5 is also implemented)

SEVERITY_NAMES = {0: "Minor comments", 1: "Major comments", 2: "Edits"}

# Edits to be reported
EDITS = ("Cross-Out", "Inserted Text")

def iter_edit_contents(page: PageObject) -> Iterator[str]:
    try:
        edit_indirects = page["/Annots"]
    except KeyError:
        return

    for edit_indirect in edit_indirects:
        edit = edit_indirect.getObject()

        try:
            if edit["/Subj"] in EDITS:
                if edit["/Subj"] == "Cross-Out":
                    yield edit["/Subj"], "-"
                else:
                    yield edit["/Subj"], edit["/Contents"]
        except KeyError:
            continue


def load_comments(filename: str, offset: int) -> SeverityDict:
    res: SeverityDict = defaultdict(list)

    reader = PdfFileReader(filename, STRICT)
    for page_num, page in enumerate(reader.pages, 1):
        for contents in iter_annot_contents(page):
            m_stars = re_stars.match(contents)
            assert m_stars is not None  # should always match

            stars = m_stars["stars"]
            comment = m_stars["comment"]

            # number of stars
            severity = len(stars)

            res[severity].append(f"p{page_num - offset}: {comment}")
        for edit_type, edit in iter_edit_contents(page):
            res[2].append(f"p{page_num - offset}: {edit_type} ({edit})")

    return res

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions