Are you looking for a lightweight, extensible way to convert from HTML to any document format?
Convert any HTML into productionβready documents β DOCX today, PDF today, XLSX tomorrow.
htmlβtoβdocument parses HTML into an intermediate, formatβagnostic tree and then feeds that tree to adapters (e.g. DOCX, PDF).
Write HTML β get Word, PDFs, spreadsheets, and more β all with one unified TypeScript API.
Below is a high-level overview of the conversion pipeline. The library processes the HTML input through optional plugin steps, parses it into a structured intermediate representation, and then delegates to an adapter to generate the desired output format.
The stages are:
- Input: Raw HTML input as a string.
- Plugins:
beforeParsehooks can inspect or transform the HTML string,onDocumenthooks can inspect or mutate the parsedDocument, andafterParsehooks can replace parsedDocumentElement[]. Deprecated middleware still works through internal plugin adaptation. - Parser: Converts the (possibly modified) HTML string into an array of
DocumentElementobjects, representing a structured AST. - Adapter: Takes the parsed
DocumentElement[]and renders it into the target format (e.g., DOCX, PDF, Markdown) via a registered adapter.
| Feature | Description |
|---|---|
| Formatβagnostic core | Converts HTML into a reusable DocumentElement[] structure |
| DOCX adapter (builtβin) | Powered by docx with rich style support |
| Pluggable adapters | Create and add your own adapter for PDF, XLSX, Markdown, etc. |
| Style mapping engine | Define your own css mappings for the adapters and set perβformat defaults |
| Custom tag handlers | Override or extend how any HTML tag is parsed |
| Page sections & headers | Use <section class="page">, <section class="page-break">, <header> and <footer> to control pages in DOCX |
| Plugin pipeline | Transform HTML before parsing, inspect the parsed Document, or replace DocumentElement[] after parsing |
npm install html-to-documentimport { init, DocxAdapter } from 'html-to-document';
import fs from 'fs';
const converter = init({
adapters: {
register: [{ format: 'docx', adapter: DocxAdapter }],
},
});
const html = '<h1>Hello World</h1>';
const buffer = await converter.convert(html, 'docx'); // β©οΈ Buffer in Node / Blob in browser
fs.writeFileSync('output.docx', buffer);You can provide adapter-specific configuration to register custom element converters when initializing. For example, with DocxAdapter:
const converter = init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
config: {
blockConverters: [new MyBlockConverter()],
inlineConverters: [new MyInlineConverter()],
fallthroughConverters: [new MyFallthroughConverter()],
},
},
],
},
});π For more on writing custom element converters, see the Custom Converters guide: https://html-to-document.vercel.app/docs/api/converters
Headers & Footers
When converting to DOCX, you can include
<header>and<footer>elements in your HTML. These will become page headers and footers in the output document. See the html-to-document-adapter-docx package for complete usage details.
import { init } from 'html-to-document';
// DOCX adapter is included. For PDF support:
// npm i html-to-document-adapter-pdf
// Docs: https://www.npmjs.com/package/html-to-document-adapter-pdf
import { DocxAdapter } from 'html-to-document-adapter-docx';
const converter = init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
// Optional adapter-specific config:
// config: {
// blockConverters: [...],
// inlineConverters: [...],
// fallthroughConverters: [...],
// },
},
],
},
});Use adapters.register[].createAdapter to customize the dependencies passed to a specific adapter during init():
const converter = init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
createAdapter: ({ Adapter, dependencies, config, format }) => {
if (format === 'docx') {
return new Adapter(
{
...dependencies,
defaultStyles: {
...dependencies.defaultStyles,
heading: { color: 'darkred' },
},
},
config
);
}
return new Adapter(dependencies, config);
},
},
],
},
});Each adapter receives a fresh dependency object, so mutations inside the factory do not leak across registrations.
Tip: you can bundle multiple adapters:
register: [ { format: 'docx', adapter: DocxAdapter }, { format: 'pdf', adapter: PdfAdapter }, ]; // To install PDF support, run: // npm i html-to-document-adapter-pdf // See docs: https://www.npmjs.com/package/html-to-document-adapter-pdf
The rest of the API stays the sameβconvert(html, 'docx'), convert(html, 'pdf'), etc.
Need just the parsed structure?
const elements = await converter.parse('<p>Some HTML</p>');
console.log(elements); // => DocumentElement[]Plugins are the primary way to extend parsing.
const converter = init({
plugins: [
{
name: 'strip-scripts',
beforeParse: async (context) => {
context.setHtml(
context.html.replace(/<script[\s\S]*?>[\s\S]*?<\/script>/g, '')
);
},
},
{
name: 'mark-generated',
afterParse: async (context) => {
context.replaceElements(
context.elements.map((element) => ({
...element,
metadata: { ...element.metadata, generated: true },
}))
);
},
},
],
});The built-in minify plugin is enabled by default. Disable built-in plugins with enableDefaultPlugins: false.
Deprecated middleware and clearMiddleware still work:
middlewareentries are adapted intobeforeParseplugins internallyclearMiddleware: trueimpliesenableDefaultPlugins: false- explicit
enableDefaultPluginsoverrides that implication
You can also register plugins after construction:
converter.usePlugin({
beforeParse: async (context) => {
context.setHtml(context.html.replace('Draft', 'Final'));
},
});To parse stylesheet rules from <style> tags during the onDocument phase:
import { cssParserPlugin } from 'html-to-document';
const converter = init({
plugins: [cssParserPlugin()],
});| Resource | Link |
|---|---|
| Full Docs | https://html-to-document.vercel.app/ |
| Live Demo (TinyMCE) | https://html-to-document-demo.vercel.app |
- Style mappings: fineβtune CSS β DOCX with
DocxStyleMapperviaDocxAdapterconfig - Stylesheet API: seed selector rules through
init()and inspect them from adapters - Tag handlers: intercept
<custom-tag>β your ownDocumentElement - Custom adapters: implement
IDocumentConverterto target new formats
You can provide stylesheet rules directly when creating the converter.
import { init, DocxAdapter } from 'html-to-document';
const converter = init({
stylesheetRules: [
{
kind: 'style',
selectors: ['p.note'],
declarations: {
color: 'rebeccapurple',
fontWeight: 'bold',
},
},
{
kind: 'at-rule',
name: 'page',
descriptors: {
size: 'A4',
margin: '1in',
},
},
],
adapters: {
register: [{ format: 'docx', adapter: DocxAdapter }],
},
});This is the simplest way to seed stylesheet statements without manually creating a stylesheet instance.
Style rules can also carry nested at-rules for forward-compatible rule trees. Those nested at-rules are preserved by the API, even though the current matcher does not evaluate them yet.
If you want full control, you can provide a stylesheet instance too.
import { init, createStylesheet } from 'html-to-document';
const stylesheet = createStylesheet();
stylesheet.addStyleRule('p.note', { color: 'green' });
stylesheet.addAtRule({
kind: 'at-rule',
name: 'page',
descriptors: { size: 'A4' },
});
const converter = init({
stylesheet,
});The library still appends built-in and configured rules onto that stylesheet during initialization.
The stylesheet seen by adapters is built from several sources:
- built-in base rules
tags.defaultStylesstylesheetRules- adapter-specific
adapters.defaultStyles
tags.defaultStyles are now added to the stylesheet as tag-based rules, not inlined into parsed element.styles.
Adapters receive a stylesheet instance in their dependencies and can inspect:
- matched selector styles with
getMatchedStyles(element) - merged styles with
getComputedStyles(element, cascadedStyles) - raw rule statements with
getStatements()/getAtRules(name)
To create a new adapter from scratch in your own project:
-
Install the core types:
npm install html-to-document-core
This package contains the necessary interfaces and type definitions like
DocumentElementandIDocumentConverter. -
Implement your adapter based on the documentation here:
Custom Converters Guide
See the Extensibility Guide.
Contributions are welcome!
Please read CONTRIBUTING.md and follow the Code of Conduct.
All notable changes are documented in CHANGELOG.md.
ISC β a permissive, MITβstyle license that allows free use, modification, and distribution without requiring permission.
