Skip to content

Conversation

toinehartman
Copy link
Member

This PR adds many generic tools that can be used to process/format strings.

TODO

  • Extensively document (once we figured out which ones we will actually keep)

Copy link

codecov bot commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 47%. Comparing base (875fe06) to head (8d8de3e).

Additional details and impacted files
@@           Coverage Diff           @@
##              main   #2373   +/-   ##
=======================================
  Coverage       47%     47%           
- Complexity    6560    6571   +11     
=======================================
  Files          780     780           
  Lines        64398   64398           
  Branches      9628    9628           
=======================================
+ Hits         30509   30517    +8     
+ Misses       31569   31555   -14     
- Partials      2320    2326    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@rodinaarssen rodinaarssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Just a few remarks and questions.

Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @jurgenvinju since he has in the past done some more stuff around formatting & indentation.

I have the feeling we're missing something in this PR, a feature we already have in rascal. (aside from the fact I don't really like all these regexps and char loops)

}
@synopsis{Split a string to an indentation prefix and the remainder of the string.}
tuple[str indentation, str rest] splitIndentation(/^<indentation:\s*><rest:.*>/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the string contains multiple lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be an explicit exception (invalid argument or similar)

int count = size(findAll(input, nl));
linesepCounts[nl] = count;
// subtract all occurrences of substrings of newline characters that we counted before
for (str snl <- substrings(nl), linesepCounts[snl]?) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this almost looks like pattern matching on strings? (which we only have reasonable support over)

for example:

rascal>visit("abcd") { case str m : println(m); }
abcd
bcd
cd
d

come to think of it, this whole function smells like an parsing automata. Where we build a big state table of all the possible matches and then iterate through all the chars and count the matches based on their state.

In java this would be 20/30 lines, but in rascal we might be missing some primitives (as we don't have a character loop).

Copy link
Member Author

@toinehartman toinehartman Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hard-code the set of newline characters (e.g. to all Unicode newline chars), we could write it as a grammar and use the parser generator. Downside (as we discussed) is that all (transitive) imports of this module will trigger generation of a parser. We could also move some of those to a specific Format module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants