Skip to content

feat(htmlmark): add HTML ↔ Markdown conversion component#25

Open
samsameer2804-cloud wants to merge 8 commits into
yoshuawuyts:mainfrom
samsameer2804-cloud:main
Open

feat(htmlmark): add HTML ↔ Markdown conversion component#25
samsameer2804-cloud wants to merge 8 commits into
yoshuawuyts:mainfrom
samsameer2804-cloud:main

Conversation

@samsameer2804-cloud

Copy link
Copy Markdown

Closes #13

Summary

This PR adds a new component htmlmark for transforming HTML into clean Markdown, following the component structure used in this repository.

Features

  • Extracts main content from HTML using readability (with fallback to raw HTML)
  • Converts HTML → Markdown using html2md
  • Optional Markdown → HTML rendering using pulldown-cmark
  • Pure implementation (no network calls)

Interface

  • extract(input: string) -> result<string, string>
  • render(md: string) -> result<string, string>

Structure

  • components/components/htmlmark/
    • wit/world.wit – interface definition
    • src/lib.rs – implementation using wit-bindgen
    • Cargo.toml – component configuration

This follows the same structure and conventions as existing components like wordmark and tablemark.

Notes

  • Includes fallback behavior if content extraction fails
  • Keeps implementation minimal and focused on the core requirement

@samsameer2804-cloud

Copy link
Copy Markdown
Author

@yoshuawuyts Fixed the folder structure (removed duplicate files at components/src/ and components/wit/), added semicolons to world.wit, and corrected import ordering in lib.rs. CI should pass now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[component] htmlmark — HTML ↔ Markdown conversion

2 participants