Skip to content

fix(tokenizer): align edge-case parsing behavior with HTML spec#2382

Merged
fb55 merged 1 commit intomasterfrom
esm
Mar 19, 2026
Merged

fix(tokenizer): align edge-case parsing behavior with HTML spec#2382
fb55 merged 1 commit intomasterfrom
esm

Conversation

@fb55
Copy link
Owner

@fb55 fb55 commented Mar 19, 2026

Handle the following tokenizer edge cases to match the HTML spec:

  • Allow comment to close on --!>,
  • Recognize closing tags that end with / before > (e.g. </div/>),
  • .reset() special-tag state correctly, and
  • Wait for ?> in XML processing instructions.

Handle HTML tokenizer edge cases by accepting comment closes on --!>, recognizing closing tags that end with / before >, resetting special-tag state correctly, and waiting for ?> in XML processing instructions.

Expand tokenizer and parser regression coverage for those cases, update affected snapshots, and exclude test-only fixtures and snapshots from the published package.
Copilot AI review requested due to automatic review settings March 19, 2026 07:28
@fb55 fb55 merged commit f2daa22 into master Mar 19, 2026
14 checks passed
@fb55 fb55 deleted the esm branch March 19, 2026 07:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the tokenizer to better match HTML/XML edge-case behavior per spec, and adjusts/adds snapshot coverage to lock in the new parsing semantics.

Changes:

  • Allow HTML comments to close on --!> (while still accepting -->).
  • Treat / as a valid “tag section terminator” to support end tags like </div/> (including within special tags).
  • In XML mode, terminate processing instructions only on ?>, and reset internal special-tag state correctly on .reset().

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/Tokenizer.ts Implements the new edge-case tokenization rules (comment close, end-tag /, PI ?>, reset behavior).
src/Tokenizer.spec.ts Adds test coverage for the new edge cases and reset behavior.
src/Parser.events.spec.ts Adds event-level coverage for special end tags ending in />.
src/snapshots/Tokenizer.spec.ts.snap Updates snapshots for new tokenizer behaviors.
src/snapshots/Parser.events.spec.ts.snap Updates snapshots for new event sequences/indices.
src/snapshots/WritableStream.spec.ts.snap Updates snapshots impacted by processing-instruction termination changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

} else if (c === CharCodes.Gt && this.sequenceIndex === 1) {
this.cbs.onprocessinginstruction(
this.sectionStart,
this.index - 1,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants