Skip to content

get rid of some false negatives in rustdoc::broken_intra_doc_links #132748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lolbinarycat
Copy link
Contributor

@lolbinarycat lolbinarycat commented Nov 7, 2024

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links ([text](url)) can actually contain a url, other types of links (reference links, shortcut links) contain a reference which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes #54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

is shortcut link not shortcut link
has backtick never ignore never ignore
no backtick ignore if url-like never ignore

@rustbot
Copy link
Collaborator

rustbot commented Nov 7, 2024

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Nov 7, 2024
@lolbinarycat lolbinarycat force-pushed the rustdoc-intra-doc-link-warn-more-54191 branch from df3d0f6 to 8ae6586 Compare November 7, 2024 21:41
@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@notriddle notriddle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good idea, but I have found one problem. Consider a sample like this:

//@ check-pass
#![no_std]
#![deny(rustdoc::broken_intra_doc_links)]

// regression test for https://github.com/rust-lang/rust/issues/54191
// false positives

//! This is not an intra-doc link: [`foobar`]
//!
//! [`foobar`]: /index.html

That test case fails, because it thinks /index.html is an intra-doc link. You're right that certain kinds of links aren't allowed to be URLs—that's the Unknown type links.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 7, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat
Copy link
Contributor Author

welp, turns out the standard library itself contains several of these false negatives!

@lolbinarycat
Copy link
Contributor Author

...or perhaps there's some deeper bug with intra-doc link generation?

there seems to be some false positives containing spaces, even though there is a matching reference definition...

I guess we can just re-enable unconditionally ignoring links with spaces?

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat
Copy link
Contributor Author

Ok, this breaks a lot of ui tests. This might actually have a pretty big impact, maybe we should do a crater run or something..?

@lolbinarycat lolbinarycat added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 8, 2024
@notriddle
Copy link
Contributor

It's looks fine to me. Here's the PRs where each of these test cases were added:

test case PR
disambiguator-endswith-named-suffix.rs #127016
weird-syntax.rs #112014
redundant_explicit_links-utf8.rs #115070

The last two are both contrived ICE tests, and all of them are intended to be parsed as intra-doc links. Making it so that they warn seems fine to me.

r? @GuillaumeGomez do you also think this is a suitable bug fix?

@rustbot rustbot assigned GuillaumeGomez and unassigned notriddle Nov 8, 2024
@notriddle notriddle self-assigned this Nov 8, 2024
@GuillaumeGomez
Copy link
Member

I think so. I'm not completely satisfied with the current impact though. Intra-doc links should not be triggered on items that contain characters which can't be in an ident or a path, like Clone\(\). Maybe a different warning in these cases?

@lolbinarycat
Copy link
Contributor Author

@GuillaumeGomez what about just amending the current warning to account for other common mistakes?

honestly makes me wish we had --explain for lints, putting a full help message for each one could get a bit verbose.

regardless, i think that should be a separate issue, since the lint output isn't the most helpful in general (it just recommends escaping the brackets, and doesn't mention other common issues, such as misspelled link references)

@GuillaumeGomez
Copy link
Member

what about just amending the current warning to account for other common mistakes?

Do you have an example of before/after of what you have in mind by any chance?

@lolbinarycat
Copy link
Contributor Author

Do you have an example of before/after of what you have in mind by any chance?

well, the most obvious one is some sort of "consider adding a reference: [whatever]: //some-url.example/", but i also think trying to catch typos by comparing the edit distance of the missing reference to that of existing references could be handy.

and perhaps we should also mention surrounding code by backticks, since code snippets should generally be in code blocks instead of individually escaping chars.

@GuillaumeGomez
Copy link
Member

well, the most obvious one is some sort of "consider adding a reference: [whatever]: //some-url.example/", but i also think trying to catch typos by comparing the edit distance of the missing reference to that of existing references could be handy.

Sounds good to me. The distance between two items could be nice too (as a follow-up?).

@lolbinarycat lolbinarycat force-pushed the rustdoc-intra-doc-link-warn-more-54191 branch from 2d38996 to a73d7e3 Compare July 24, 2025 17:38
@rustbot
Copy link
Collaborator

rustbot commented Jul 24, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@GuillaumeGomez
Copy link
Member

@bors try jobs=dist-x86_64-linux-alt

@rust-bors
Copy link

rust-bors bot commented Jul 31, 2025

⌛ Trying commit a73d7e3 with merge 179e3ad

To cancel the try build, run the command @bors try cancel.

rust-bors bot added a commit that referenced this pull request Jul 31, 2025
…-54191, r=<try>

get rid of some false negatives in rustdoc::broken_intra_doc_links

try-job: dist-x86_64-linux-alt
@rust-bors
Copy link

rust-bors bot commented Aug 1, 2025

☀️ Try build successful (CI)
Build commit: 179e3ad (179e3ad9323f8c265907b8a58b43e6920b36106a, parent: adcb3d3b4cd3b7c4cde642f3ed537037f293738e)

@GuillaumeGomez
Copy link
Member

CI is happy so let's go!

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Aug 1, 2025

📌 Commit a73d7e3 has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 1, 2025
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 1, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
bors added a commit that referenced this pull request Aug 1, 2025
Rollup of 10 pull requests

Successful merges:

 - #132748 (get rid of some false negatives in rustdoc::broken_intra_doc_links)
 - #135771 ([rustdoc] Add support for associated items in "jump to def" feature)
 - #143360 (loop match: error on `#[const_continue]` outside `#[loop_match]`)
 - #143662 ([rustdoc] Display unsafe attrs with edition 2024 `unsafe()` wrappers.)
 - #143900 ([rustdoc] Correctly handle `should_panic` doctest attribute and fix `--no-run` test flag on the 2024 edition)
 - #144614 (Fortify RemoveUnneededDrops test.)
 - #144703 ([test][AIX] ignore extern_weak linkage test)
 - #144738 (Remove the omit_gdb_pretty_printer_section attribute)
 - #144756 (detect infinite recursion with tail calls in ctfe)
 - #144766 (Add human readable name "Cygwin")

r? `@ghost`
`@rustbot` modify labels: rollup
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 1, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
bors added a commit that referenced this pull request Aug 1, 2025
Rollup of 11 pull requests

Successful merges:

 - #132748 (get rid of some false negatives in rustdoc::broken_intra_doc_links)
 - #135771 ([rustdoc] Add support for associated items in "jump to def" feature)
 - #143360 (loop match: error on `#[const_continue]` outside `#[loop_match]`)
 - #143662 ([rustdoc] Display unsafe attrs with edition 2024 `unsafe()` wrappers.)
 - #143900 ([rustdoc] Correctly handle `should_panic` doctest attribute and fix `--no-run` test flag on the 2024 edition)
 - #144478 (Improve formatting of doc code blocks)
 - #144703 ([test][AIX] ignore extern_weak linkage test)
 - #144747 (compiletest: Improve diagnostics for line annotation mismatches 2)
 - #144756 (detect infinite recursion with tail calls in ctfe)
 - #144766 (Add human readable name "Cygwin")
 - #144782 (Properly pass path to staged `rustc` to `compiletest` self-tests)

r? `@ghost`
`@rustbot` modify labels: rollup
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 1, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 1, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
bors added a commit that referenced this pull request Aug 2, 2025
Rollup of 12 pull requests

Successful merges:

 - #132748 (get rid of some false negatives in rustdoc::broken_intra_doc_links)
 - #135771 ([rustdoc] Add support for associated items in "jump to def" feature)
 - #143360 (loop match: error on `#[const_continue]` outside `#[loop_match]`)
 - #143662 ([rustdoc] Display unsafe attrs with edition 2024 `unsafe()` wrappers.)
 - #144478 (Improve formatting of doc code blocks)
 - #144703 ([test][AIX] ignore extern_weak linkage test)
 - #144747 (compiletest: Improve diagnostics for line annotation mismatches 2)
 - #144766 (Add human readable name "Cygwin")
 - #144782 (Properly pass path to staged `rustc` to `compiletest` self-tests)
 - #144786 (Cleanup the definition of `group_type`)
 - #144796 (Add my previous commit name to .mailmap)
 - #144797 (Update safety comment for new_unchecked in niche_types)

r? `@ghost`
`@rustbot` modify labels: rollup
samueltardieu added a commit to samueltardieu/rust that referenced this pull request Aug 2, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
bors added a commit that referenced this pull request Aug 2, 2025
Rollup of 18 pull requests

Successful merges:

 - #132748 (get rid of some false negatives in rustdoc::broken_intra_doc_links)
 - #135771 ([rustdoc] Add support for associated items in "jump to def" feature)
 - #143360 (loop match: error on `#[const_continue]` outside `#[loop_match]`)
 - #143662 ([rustdoc] Display unsafe attrs with edition 2024 `unsafe()` wrappers.)
 - #143771 (Constify some more `Result` functions)
 - #143900 ([rustdoc] Correctly handle `should_panic` doctest attribute and fix `--no-run` test flag on the 2024 edition)
 - #144185 (Document guarantees of poisoning)
 - #144395 (update fortanix tests)
 - #144478 (Improve formatting of doc code blocks)
 - #144614 (Fortify RemoveUnneededDrops test.)
 - #144703 ([test][AIX] ignore extern_weak linkage test)
 - #144747 (compiletest: Improve diagnostics for line annotation mismatches 2)
 - #144756 (detect infinite recursion with tail calls in ctfe)
 - #144766 (Add human readable name "Cygwin")
 - #144782 (Properly pass path to staged `rustc` to `compiletest` self-tests)
 - #144786 (Cleanup the definition of `group_type`)
 - #144796 (Add my previous commit name to .mailmap)
 - #144797 (Update safety comment for new_unchecked in niche_types)

Failed merges:

 - #144805 (compiletest: Preliminary cleanup of `ProcRes` printing/unwinding)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit that referenced this pull request Aug 2, 2025
Rollup of 17 pull requests

Successful merges:

 - #132748 (get rid of some false negatives in rustdoc::broken_intra_doc_links)
 - #143360 (loop match: error on `#[const_continue]` outside `#[loop_match]`)
 - #143662 ([rustdoc] Display unsafe attrs with edition 2024 `unsafe()` wrappers.)
 - #143771 (Constify some more `Result` functions)
 - #144185 (Document guarantees of poisoning)
 - #144395 (update fortanix tests)
 - #144478 (Improve formatting of doc code blocks)
 - #144614 (Fortify RemoveUnneededDrops test.)
 - #144703 ([test][AIX] ignore extern_weak linkage test)
 - #144747 (compiletest: Improve diagnostics for line annotation mismatches 2)
 - #144756 (detect infinite recursion with tail calls in ctfe)
 - #144766 (Add human readable name "Cygwin")
 - #144782 (Properly pass path to staged `rustc` to `compiletest` self-tests)
 - #144786 (Cleanup the definition of `group_type`)
 - #144796 (Add my previous commit name to .mailmap)
 - #144797 (Update safety comment for new_unchecked in niche_types)
 - #144803 (rustc-dev-guide subtree update)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit fc4b3fa into rust-lang:master Aug 3, 2025
11 checks passed
@rustbot rustbot added this to the 1.90.0 milestone Aug 3, 2025
rust-timer added a commit that referenced this pull request Aug 3, 2025
Rollup merge of #132748 - lolbinarycat:rustdoc-intra-doc-link-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes #54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustdoc does not warn about broken links if they contain . or []
8 participants