Skip to content

Refactor jsx mode in parser #7751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

nojaf
Copy link
Member

@nojaf nojaf commented Aug 2, 2025

I was experimenting with the parser related to JSX and noticed that we have a somewhat convoluted mechanism for handling /> and - in identifier names.

First, I created a token dump tool in res_parser, which was previously missing.

Currently, the parser employs a sequence of Scanner.set_jsx_mode p.Parser.scanner; and Scanner.pop_mode p.scanner Jsx; while processing elements to distinguish between parser JSX and non-JSX. However, this is mainly used in the following cases:

  • Allowing a - inside an identifier. This logic belongs in the parser, but it currently resides in the scanner, which feels inappropriate.
  • Combining a < + / into a </ token. I would prefer using a lookahead for when a < is encountered. This would clarify that it's specific to JSX parsing. Even though LessThanSlash exists, there is still a separate LessThan + Slash check, which makes the code a bit messy.

This is an effort to streamline the JSX mode.

PS: to run the local analysis tests, I had to revert to legacy clean. This is for the better until we figure out #7707

| LessThan ->
(* Imagine: <div> <Navbar /> <
* is `<` the start of a jsx-child? <div …
* or is it the start of a closing tag? </div>
* reconsiderLessThan peeks at the next token and
* determines the correct token to disambiguate *)
let token = Scanner.reconsider_less_than p.scanner in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what bother me a bit, there is LessThanSlash above, yet we still need to do the reconsider_less_than call.

let attr_expr = parse_primary_expr ~operand:(parse_atomic_expr p) p in
Some (Parsetree.JSXPropValue ({txt = name; loc}, optional, attr_expr))
| _ -> Some (Parsetree.JSXPropPunning (false, {txt = name; loc})))
(* {...props} *)
| Lbrace -> (
Scanner.pop_mode p.scanner Jsx;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather confusing when you are in a nested jsx scenario:

<div>
  <p>
    {foo}
  </p>
</div>

Popping Jsx from p requires the pop of div also to happen to get out of Jsx mode.

posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->32:10]
JSX <di:[30:10->30:12] div[32:6->32:9]=...[32:6->32:9]> _children:None
posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->30:12]
JSX <di:[30:10->30:12] > _children:None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes is because there is slightly different AST for:

<div>
  <di
</div>

It used to be

[
  structure_item (A.res[1,0+0]..[3,12+6])
    Pstr_eval
    expression (A.res[1,0+0]..[3,12+6])
      Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4])
      jsx_props =
        []
      > [1,0+4]
      jsx_children =
        [
          expression (A.res[2,6+2]..[3,12+6])
            Pexp_jsx_container_element "di" (A.res[2,6+3]..[2,6+5])
            jsx_props =
              [
                div              ]
            > [3,12+5]
            jsx_children =
              []
        ]
]

and now is

[
  structure_item (A.res[1,0+0]..[3,12+6])
    Pstr_eval
    expression (A.res[1,0+0]..[3,12+6])
      Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4])
      jsx_props =
        []
      > [1,0+4]
      jsx_children =
        [
          expression (A.res[2,6+2]..[2,6+5])
            Pexp_jsx_unary_element "di" (A.res[2,6+3]..[2,6+5])
            jsx_props =
              []
        ]
]

I think this is more correct. The unary (new) versus container (old) doesn't matter that much, neither can be determined for <di.
However, the container had a weird prop div, which it no longer has.

Copy link

pkg-pr-new bot commented Aug 2, 2025

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@7751

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@7751

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@7751

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@7751

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@7751

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@7751

commit: 9d8641e

@nojaf nojaf marked this pull request as ready for review August 2, 2025 12:00
@nojaf nojaf requested a review from cristianoc August 2, 2025 12:00
@nojaf nojaf changed the title Add token dump printer Refactor jsx mode in parser Aug 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant