Skip to content

Conversation

@NeatGuyCoding
Copy link

@NeatGuyCoding NeatGuyCoding commented Nov 3, 2025

Necessity of Adding DNS Lookup JFR Event

Cloud-native environments rely heavily on DNS for service discovery, where DNS queries are frequent and latency-critical. Java's DNS caching policy (default 30s TTL) significantly impacts performance.

Problems:

  • Cannot distinguish cache hits from actual queries
  • DNS latency is difficult to track and diagnose
  • Multiple libraries concurrently resolving domains cause redundant queries

Value:

  • Complete traceability: DNS query → Socket connect → Data transfer
  • Troubleshooting: Identify DNS timeouts, resolution failures
  • Performance optimization: Evaluate cache policies, discover hotspot domains
  • Security audit: Track external domains accessed

This event complements SocketRead/Write events, enhancing network observability.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28110/head:pull/28110
$ git checkout pull/28110

Update a local copy of the PR:
$ git checkout pull/28110
$ git pull https://git.openjdk.org/jdk.git pull/28110/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28110

View PR using the GUI difftool:
$ git pr show -t 28110

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28110.diff

Signed-off-by: NeatGuyCoding <[email protected]>
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 3, 2025

👋 Welcome back NeatGuyCoding! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 3, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Nov 3, 2025

@NeatGuyCoding The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-jfr
  • net
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mrserb
Copy link
Member

mrserb commented Nov 3, 2025

It might be useful to differentiate between three use cases:

  • Successful/Error requests when caches are disabled or not used -> these represent actual network DNS lookups
  • Successful/Error requests returned from the cache -> the application may see many repeated errors not due to new DNS failures, but because a previous error response was cached
  • Successful/Semi-Error requests -> this occurs when a DNS error happens, but the application continues to function because a stale cached record is used

It might also be useful to include some information about the DNS cache itself — for example, whether the result came from cache, the TTL of the entry, or whether stale data was used.
BTW, there is also a mechanism to clean up the cache; perhaps we could expose metrics such as the current cache size, number of entries removed, and number of stale entries.

What do you think?

@NeatGuyCoding
Copy link
Author

It might be useful to differentiate between three use cases:

  • Successful/Error requests when caches are disabled or not used -> these represent actual network DNS lookups
  • Successful/Error requests returned from the cache -> the application may see many repeated errors not due to new DNS failures, but because a previous error response was cached
  • Successful/Semi-Error requests -> this occurs when a DNS error happens, but the application continues to function because a stale cached record is used

It might also be useful to include some information about the DNS cache itself — for example, whether the result came from cache, the TTL of the entry, or whether stale data was used. BTW, there is also a mechanism to clean up the cache; perhaps we could expose metrics such as the current cache size, number of entries removed, and number of stale entries.

What do you think?

Thanks for the reply, I will investigate and make some updates

@NeatGuyCoding
Copy link
Author

@mrserb redesigned, kindly check, thanks

@AlanBateman
Copy link
Contributor

Please start a discussion on net-dev. Also keep in mind that these methods use whatever name service is configured so the lookups may not be DNS. This will influence the naming, if events are introduced.

@NeatGuyCoding
Copy link
Author

email sent to [email protected]
Thanks @AlanBateman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants