Skip to content

docs: expand README#2384

Merged
fb55 merged 1 commit intomasterfrom
docs/improve-readme
Mar 19, 2026
Merged

docs: expand README#2384
fb55 merged 1 commit intomasterfrom
docs/improve-readme

Conversation

@fb55
Copy link
Owner

@fb55 fb55 commented Mar 19, 2026

Document all parser events, options, and common workflows (searching, modifying, serializing the DOM)

Closes #1765

The wiki is now disabled

Document all parser events, options, and common workflows (searching,
modifying, serializing the DOM)

Closes #1765
Copilot AI review requested due to automatic review settings March 19, 2026 11:18
@fb55 fb55 merged commit f2bb3b0 into master Mar 19, 2026
14 checks passed
@fb55 fb55 deleted the docs/improve-readme branch March 19, 2026 11:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the README documentation to make htmlparser2 easier to use without reading source code, focusing on parser callbacks/options and common DOM workflows (parse → query → modify → serialize), addressing #1765.

Changes:

  • Added a comprehensive table of parser events (callbacks) supported by Parser.
  • Added a parser options table and expanded parseDocument documentation, including combined parser + domhandler options.
  • Added practical DOM workflow examples (searching with DomUtils / css-select, modifying nodes, and serializing back to HTML) and expanded parseFeed documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +109 to +116
| Option | Type | Default | Description |
| ------------------------ | --------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `xmlMode` | `boolean` | `false` | Treat the document as XML. This affects entity decoding, self-closing tags, CDATA handling, and more. Set this to `true` for XML, RSS, Atom and RDF feeds. |
| `decodeEntities` | `boolean` | `true` | Decode HTML entities (e.g. `&` -> `&`). |
| `lowerCaseTags` | `boolean` | `!xmlMode` | Lowercase tag names. |
| `lowerCaseAttributeNames`| `boolean` | `!xmlMode` | Lowercase attribute names. |
| `recognizeSelfClosing` | `boolean` | `xmlMode` | Recognize self-closing tags (e.g. `<br/>`). Always enabled in `xmlMode`. |
| `recognizeCDATA` | `boolean` | `xmlMode` | Recognize CDATA sections as text. Always enabled in `xmlMode`. |
Comment on lines +151 to +161
`parseDocument` accepts an optional second argument with both parser and [DOM handler options](https://github.com/fb55/domhandler):

```js
const dom = htmlparser2.parseDocument(data, {
// Parser options
xmlMode: true,

// domhandler options
withStartIndices: true, // Add `startIndex` to each node
withEndIndices: true, // Add `endIndex` to each node
});
const feed = htmlparser2.parseFeed(content);
```

This returns an object with `type`, `title`, `link`, `description`, `updated`, `author`, and `items` (an array of feed entries), or `null` if the document isn't a recognized feed format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lacking documentation

2 participants