Skip to content

8356439: Rename JavaLangAccess::*NoRepl methods #26413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

vy
Copy link
Contributor

@vy vy commented Jul 21, 2025

NoRepl-suffixed String methods denote methods that do not replace invalid characters, but throw CharacterCodingException on encounter. This behavior cannot easily be derived from the method footprints, has been a source of confusion for maintainers, and is not uniformly adopted, e.g., newStringUTF8NoRepl() and getBytesUTF8NoRepl() does not throw CCE. This PR replaces the NoRepl suffix with NoReplacement in method names and consistently uses throws CCE in method footprints.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8356439: Rename JavaLangAccess::*NoRepl methods (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26413/head:pull/26413
$ git checkout pull/26413

Update a local copy of the PR:
$ git checkout pull/26413
$ git pull https://git.openjdk.org/jdk.git pull/26413/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26413

View PR using the GUI difftool:
$ git pr show -t 26413

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26413.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 21, 2025

👋 Welcome back vyazici! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 21, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jul 21, 2025

@vy The following labels will be automatically applied to this pull request:

  • core-libs
  • nio
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest against using CCE as the standard exception. The only place that relies on CCE is Files; IAE is more suitable for everywhere else. I recommend adding the special CCE handling in Files alone so we can remove the redundant try-catch everywhere else.

@@ -325,7 +325,7 @@ public interface JavaLangAccess {
* @return the newly created string
* @throws CharacterCodingException for malformed or unmappable bytes
*/
String uncheckedNewStringNoRepl(byte[] bytes, Charset cs) throws CharacterCodingException;
String uncheckedNewString(byte[] bytes, Charset cs) throws CharacterCodingException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should mention these two details:

  1. This method does not replace upon malformed data but fails
  2. This method does not copy the byte array for validation (can add to the warning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should mention these two details:

  1. This method does not replace upon malformed data but fails
  2. This method does not copy the byte array for validation (can add to the warning)

@liach, I presume you want more than the following two (existing) comment blocks addressing these concerns:

 * <b>WARNING: The caller of this method shall relinquish and transfer the
 * ownership of the byte array to the callee</b>, since the latter will not
 * make a copy.
 ...
 * @throws CharacterCodingException for malformed or unmappable bytes

If so, mind elaborating on your suggested enhancements, please?

@vy
Copy link
Contributor Author

vy commented Jul 21, 2025

@liach, thanks so much for the prompt feedback!

I strongly suggest against using CCE as the standard exception.

Would you mind elaborating on the rationale behind this preference, please?

@liach
Copy link
Member

liach commented Jul 21, 2025

Would you mind elaborating on the rationale behind this preference, please?

If you look at the history of newStringNoRepl, you will find it's originally added for JDK-8201276 in ca48716. Since then, newStringNoRepl has been used by various other sites mostly as a utility. None of those further usages ever used the CharacterCodingException. In fact, in String, the actual implementation throws an IAE that is later wrapped in a CCE. I don't think we should punish all other users of this API just because Files.readString requires this CCE; the CCE wrapping can be moved to Files.readString.

@AlanBateman
Copy link
Contributor

newStringNoRepl has confused several maintainers as it's not immediately obvious that it means "no replace", or more specifically CodingErrorAction.REPORT behavior. So it's good to see this getting attention.

Anyone working on String expects CodingErrorAction.REPLACE behavior so having String methods use methods that don't throw would feel right.

There are "far away" APIs, Files.readXXX was mentioned, that require CodingErrorAction.REPORT behavior methods so having JLA methods throw CharacterCodingException is helpful.

There are other far away APIs such as UUID and ZipCoder that specify a Charset and know that no exception will be throw. They want CodingErrorAction.REPORT behavior except that it's a bug if CharacterCodingException is thrown. These cases catch CCE and throw AssertionError, which is okay, and hopefully we have enough tests in all these area to ensure that it doesn't happen.

Anyway, from a first read of the changes then I think we need to make sure that the method descriptions are accurate. There are several places where "IAE" is mentioned but the methods are changed to throw CCE. IAE is awkward in this area because throwing it, instead of CCE, makes it difficult to know if the handling is correct. There is constant code churn in this area and too easy to introduce a regression and "unexpected exception" if IAE is used to in place of a CharacterCodingException.

Another high level comment from a first read is that it feels like there are two forms needed. One form is REPLACE action and doesn't throw. The other is REPORT action and throws CharacterCodingException. That is what we have already, it's just the naming is awkward. So it may be that it's more about finding good names so that it's clear from all the use sites.

@vy
Copy link
Contributor Author

vy commented Jul 24, 2025

There are several places where "IAE" is mentioned but the methods are changed to throw CCE.

@AlanBateman, thanks for the tip. Pushed 10cb72c, which improves Javadoc in such places. While doing so, it also

  • Ensures that String Javadoc matches the one shared in JavaLangAccess
  • Adds some missing(?) requireNonNull checks, and documents them as well.

@vy vy marked this pull request as ready for review July 28, 2025 09:42
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 28, 2025
@mlbridge
Copy link

mlbridge bot commented Jul 28, 2025

Webrevs

@RogerRiggs
Copy link
Contributor

If CCE should have a constructor with a message, it can be added if you have a clear idea how it would be used.

@vy
Copy link
Contributor Author

vy commented Aug 8, 2025

Motivated by @RogerRiggs inquiries, and @AlanBateman's below remark,

Another high level comment from a first read is that it feels like there are two forms needed. One form is REPLACE action and doesn't throw. The other is REPORT action and throws CharacterCodingException.

grouped String methods by doReplace argument in 1fb8582. There are several String methods of the following form:

private static byte[] encode8859_1(..., boolean doReplace) {
    // ...
    if (!doReplace) {
        throwUnmappable(sp);
    }
    // ...
 }

We want to make doReplace = false case visible in the function footprint, and this resulted in:

private static byte[] encode8859_1(..., boolean doReplace) throws CCE { ... }

Though this propagates the checked CCE even for doReplace = true. To avoid this, I've grouped methods by doReplace argument into two:

private static byte[] encode8859_1(...) { ... }
private static byte[] encode8859_1NoReplacement(...) throws CCE { ... }

As a result,

  1. doReplace = true call-sites invoke encode8859_1(), replacement will occur, and there is no checked CCE thrown
  2. doReplace = false call-sites invoke encode8859_1NoReplacement(), replacement will not occur, and CCE needs to be handled

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review still in progress, feel free to ping me more more...

@RogerRiggs
Copy link
Contributor

It seems that the API is overloaded trying to satisfy too many requirements, replace/noreplace, throw/nothrow and supporting arbitrary Charsets. There are multiple callers that only need to create a string from byte array holding latin1.
They are burdened with catching and ignoring exceptions that do not occur.

I'm suggesting breaking out that use case in PR##26831.
That leaves Files.readString that needs the full CharSet/noReplace/throw behavior.

private static <E extends Exception> byte[] encodeWithEncoder(
Charset cs, byte coder, byte[] val, Class<E> exceptionClass)
// Parametrizing on exception type to enable callers (using null) to avoid having to declare the exception
throws E {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RogerRiggs, since there is nothing stopping the programmer from passing TotallyUnrelatedException.class to encodeWithEncoder() (or to 2 other sneakily throwing methods), I've tried adding the following as the first thing in this methods' body:

assert exceptionClass == null || CharacterCodingException.class.isAssignableFrom(exceptionClass);

This beautifully caused a JVM crash due to early initialization issues surrounding String. @liach suggested the following:

  1. Move assertions to StringLatin1 and StringUTF16 – This solves the issue, but the check is at a relatively later stage.
  2. Replace assert with a runtime check, e.g., if (...) { throw new AssertionError(...); } – This solves the issue, but incurs a runtime penalty

What would you advice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the programmer" is an OpenJDK developer and won't be making spurious changes.
The exception classes are only present to make the compiler satisfied, they are ignored at the point the exceptions are thrown.
A mismatch in the exception class would show up at compile time, not matching the signature of the calling method.
And they are buried in the implementation under the decodeUTF8_UTF16 and decodeUTF8_UTF16NoReplacement wrappers.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates.

private static <E extends Exception> byte[] encodeWithEncoder(
Charset cs, byte coder, byte[] val, Class<E> exceptionClass)
// Parametrizing on exception type to enable callers (using null) to avoid having to declare the exception
throws E {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the programmer" is an OpenJDK developer and won't be making spurious changes.
The exception classes are only present to make the compiler satisfied, they are ignored at the point the exceptions are thrown.
A mismatch in the exception class would show up at compile time, not matching the signature of the calling method.
And they are buried in the implementation under the decodeUTF8_UTF16 and decodeUTF8_UTF16NoReplacement wrappers.

@@ -1309,29 +1313,29 @@ private static int malformed4(byte[] src, int sp) {
}

@SuppressWarnings("unchecked")
private static <E extends Exception> E malformedInputException(int off, int nb, Class<E> exceptionType) {
private static <E extends Exception> E malformedInputException(int off, int nb, Class<E> exceptionClass) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument name can be "_", since it is ignored and is unused in the body.
Comment the method as always throwing MalformedInputException.
Similarly in the other malformedInput methods*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants