Skip to content

kossembly-dot/hwp2hwpx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hwp2hwpx

CI PyPI License

Convert HWP files to HWPX format — the only pip install-able HWP→HWPX converter.

HWP is the legacy binary format used by Hangul (한글), the dominant word processor in South Korea. HWPX is the modern XML-based format (OWPML/ODF-like ZIP archive). This package converts between them programmatically — no Hangul installation or GUI required.

🇰🇷 한국어

Why?

Tool What it does Limitation
Hangul GUI Open HWP → Save As HWPX Manual, not scriptable
HwpxConverter.exe Bundled with Hangul, GUI only No CLI, Windows only
kordoc Parses HWP → Markdown/JSON Extracts content, doesn't convert format
hwp2hwpx ← this Converts HWP → HWPX (valid ZIP/XML) Needs Java runtime

If you need to read HWP content → use kordoc. If you need a real HWPX file you can open/edit in Hangul → use this.

Install

pip install hwp2hwpx

Requires Java Runtime (JRE) 8+:

# Windows
winget install EclipseAdoptium.Temurin.21.JDK

# macOS
brew install temurin

# Linux (Debian/Ubuntu)
apt install default-jre

Usage

CLI

# Single file
hwp2hwpx document.hwp

# Multiple files
hwp2hwpx *.hwp

# Output directory
hwp2hwpx document.hwp -o output/

# Recursive folder conversion
hwp2hwpx ./documents/ -r

Python API

from hwp2hwpx import convert, convert_batch

# Single file
output_path = convert("document.hwp")
output_path = convert("document.hwp", "output.hwpx")

# Batch
results = convert_batch(["a.hwp", "b.hwp"], output_dir="output/")
for input_path, output_path, error in results:
    if error:
        print(f"FAIL: {input_path}: {error}")
    else:
        print(f"OK: {output_path}")

How it works

Bundles neolord0/hwp2hwpx Java library as a fat JAR:

  • hwplib — reads HWP binary (OLE2/CFB compound document)
  • hwpxlib — writes HWPX XML (ZIP archive with OWPML structure)

Pure file-format conversion. No Hangul installation, no COM API, no DRM issues.

Korean file paths on Windows are automatically handled via temp-file workaround (JVM encoding issue bypass).

Development

pip install -e ".[test]"
pytest

License

Apache License 2.0

Based on Java libraries by neolord0:


한국어

HWP(한글 워드프로세서) 파일을 HWPX(OWPML) 형식으로 변환하는 Python 패키지.

pip install hwp2hwpx 한 줄로 설치, 바로 사용. 한글 프로그램 설치 불필요.

설치

pip install hwp2hwpx

Java 필요: winget install EclipseAdoptium.Temurin.21.JDK

사용법

hwp2hwpx 문서.hwp
hwp2hwpx *.hwp -o 출력폴더/
from hwp2hwpx import convert
convert("문서.hwp")

kordoc과의 차이

  • kordoc: HWP를 읽어서 마크다운/JSON으로 추출 (텍스트 파싱)
  • hwp2hwpx: HWP를 HWPX 파일로 변환 (한글에서 열 수 있는 완전한 문서)

About

Convert HWP files to HWPX format from Python and the command line

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages