Skip to content

ENH: Images embedded in cells. The DISPIMG function of WPSΒ #61888

@lunavexxx

Description

@lunavexxx

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Hi!
I found that there is an issue with the WPS image. The software allows images to be directly embedded into cells, and the format is similar to =DISPIMG ("ID5BA4F81A0D674C7AA8849A79AC5645C8", 1).

Image

Therefore, it cannot be accessed through worksheets. _images

If we unzip Excel, we can find all the images under xl/media, and the image indexes are in xl/-rels/cellimages.xml.rels and xl/ellimages.xml

This is a unique feature of WPS, at least I haven't found it in Office.

I found a similar implementation

Feature Description

This is my code, which will decompress Excel, read the file, and return an Id to address mapping

def wps_embed_images(file_path, save_path) -> dict:
    img_map = {}

    with zipfile.ZipFile(file_path, "r") as zip_ref:
        zip_ref.extractall(save_path)

    id2target = {}
    rels = os.path.join(save_path, "xl", "_rels", "cellimages.xml.rels")
    tree = ET.parse(rels)
    root = tree.getroot()
    for child in root:
        id2target[child.attrib.get("Id")] = os.path.join(save_path, "xl", child.attrib.get("Target"))

    namespaces = {
        'etc': 'http://www.wps.cn/officeDocument/2017/etCustomData',
        'xdr': 'http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing',
        'a': 'http://schemas.openxmlformats.org/drawingml/2006/main',
        'r': 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
    }

    cellimages = os.path.join(save_path, "xl", "cellimages.xml")
    tree = ET.parse(cellimages)
    root = tree.getroot()
    for cell_image in root.findall('etc:cellImage', namespaces):
        c_nv_pr = cell_image.find('.//xdr:cNvPr', namespaces)
        image_name = c_nv_pr.get('name') if c_nv_pr is not None else None

        blip = cell_image.find('.//a:blip', namespaces)
        embed_id = blip.get(f'{{{namespaces["r"]}}}embed') if blip is not None else None

        if image_name and embed_id:
            img_map[image_name] = id2target[embed_id]

    return img_map

Alternative Solutions

We leave it as it is and I continue using the solution shown above.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds InfoClarification about behavior needed to assess issueNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions