Skip to content

feat(codegen): strip HTML tags from documentation traits in generated JSDoc#1908

Open
TrevorBurnham wants to merge 1 commit intosmithy-lang:mainfrom
TrevorBurnham:fix/strip-html-from-documentation-traits
Open

feat(codegen): strip HTML tags from documentation traits in generated JSDoc#1908
TrevorBurnham wants to merge 1 commit intosmithy-lang:mainfrom
TrevorBurnham:fix/strip-html-from-documentation-traits

Conversation

@TrevorBurnham
Copy link
Contributor

Issue #, if available:

Resolves aws/aws-sdk-js-v3#6876

Description of changes:

Smithy model @documentation trait values often contain raw HTML markup (<p>, <a href="...">, <code>, <ul>/<li>, etc.). The TypeScript codegen currently passes this HTML through verbatim into JSDoc comments, making hover docs in editors like VS Code and Neovim very hard to read.

For example, generated code ends up with comments like:

<p>Registers a new task definition from the supplied <code>family</code> and
<code>containerDefinitions</code>. For more information, see
<a href="https://docs.aws.amazon.com/...">Amazon ECS Task Definitions</a>
in the <i>Amazon Elastic Container Service Developer Guide</i>.</p>

After this change, the same ECS example renders as:

Registers a new task definition from the supplied `family` and
`containerDefinitions`. For more information, see Amazon ECS Task Definitions
in the Amazon Elastic Container Service Developer Guide.

Implementation details

Added a DocumentationConverter utility class in smithy-typescript-codegen with an htmlToPlainText(String html) method that converts HTML documentation into clean plaintext suitable for JSDoc. The converter:

  • Extracts link text from <a> tags (drops URLs)
  • Wraps <code>/<pre> content in backticks
  • Converts <li> items to dash-prefixed lines
  • Converts <dt>/<dd> definition list elements to readable format
  • Replaces block-level elements (<p>, <br>, <h1><h6>, <div>, etc.) with paragraph breaks
  • Strips all remaining inline formatting tags (<b>, <i>, <strong>, <em>, <span>, etc.)
  • Decodes HTML entities (&amp;, &lt;, &gt;, &quot;, &#39;, &nbsp;, numeric/hex entities)
  • Normalizes whitespace (collapses runs of blank lines and spaces)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@TrevorBurnham TrevorBurnham requested a review from a team as a code owner March 4, 2026 14:35
… JSDoc

Adds a DocumentationConverter utility that converts HTML documentation
strings from Smithy @documentation traits into plain text before they
are emitted as JSDoc comments. This makes hover docs in editors (VS Code,
Neovim, etc.) significantly more readable by removing raw HTML tags like
<p>, <a>, <code>, <ul>/<li>, and others.

Fixes aws/aws-sdk-js-v3#6876
@TrevorBurnham TrevorBurnham force-pushed the fix/strip-html-from-documentation-traits branch from c449627 to c11e1a8 Compare March 4, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove all HTML tags from code comments

1 participant