Skip to content

Conversation

Copy link

Copilot AI commented Nov 28, 2025

Pull request

Proposed changes

Addresses #80, #92, and #93.

DOI Fix (#93): Updated incorrect DOI 10.5281/zenodo.1485152310.5281/zenodo.14171285 in submissions/445/index.qmd (front matter, href, and Zenodo link).

Broken Citation (#92): Fixed malformed citation syntax:

- [e.g. based on TrOCR models](@li_trocr_2022)
+ [e.g. based on TrOCR models @li_trocr_2022]

404 Check in CI (#80): Added link validation step to quarto-publish.yml using lychee (installed via Homebrew for macOS ARM compatibility):

  • Uses browser-like User-Agent header to avoid false positives from sites blocking automated tools (e.g., doi.org)
  • Accepts 403/429 responses as valid (rate limiting doesn't mean broken)
  • Excludes known problematic sites (LinkedIn, X/Twitter) that block scrapers
  • Excludes site_libs folder to speed up checks
  • Has built-in smart retry logic and proper redirect handling for DOIs
  • Keeps macOS runner for typesetting requirements
  • Outputs warnings for broken links instead of failing the build

Types of changes

  • New feature (non-breaking change which adds functionality).
  • Enhancement (non-breaking change which enhances functionality)
  • Bug Fix (non-breaking change which fixes an issue).
  • Breaking change (fix or feature that would cause existing functionality to change).

Checklist

  • I have read the README document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have mentioned all co-authors in the PR description as Co-authored-by: Name <[email protected]>.
Original prompt

implement #80 (rethink proposed implementation) and fix #92 as well as #93


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 28, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI changed the title [WIP] Rethink proposed implementation and fix related issues Fix DOI, broken citation link, and add 404 check to CI Nov 28, 2025
Copilot AI requested a review from maehr November 28, 2025 14:30
Updated .gitignore to include additional environment files and cache directories.
Revised text for clarity and consistency in discussing the factors influencing ATR usage in historical projects, including the impact of handwriting heterogeneity, text volume, analysis methods, and research questions.
@maehr maehr marked this pull request as ready for review November 28, 2025 14:57
@maehr maehr requested a review from mtwente November 28, 2025 14:58
@maehr
Copy link
Contributor

maehr commented Nov 28, 2025

@copilot the 404 checker should also check external URLs (like DOI) and respective redirects.

Copy link
Author

Copilot AI commented Nov 28, 2025

@copilot the 404 checker should also check external URLs (like DOI) and respective redirects.

Updated the 404 checker in commit ee2f5db to check external URLs including DOIs. The changes:

  • Replaced --domains=localhost with --span-hosts to allow checking external URLs
  • Added --timeout=30 and --tries=3 to handle slow external servers gracefully
  • wget follows redirects by default, so DOI redirects (like doi.org → zenodo.org) are properly validated

@maehr
Copy link
Contributor

maehr commented Nov 29, 2025

@copilot index all html files first and check for dead links without recursion

Copy link
Author

Copilot AI commented Nov 29, 2025

@copilot index all html files first and check for dead links without recursion

Refactored in commit f272edb. The 404 checker now:

  1. Indexes all HTML files in _site directory
  2. Extracts all href and src links from the HTML files
  3. Separates internal and external links
  4. Checks each link individually without recursion
  5. External links (including DOIs) are checked with proper timeout and retries

@maehr
Copy link
Contributor

maehr commented Nov 29, 2025

@copilot link checking is too greedy, does not resolve internal links correctly it seems. Also only output a warning instead of failing

Copy link
Author

Copilot AI commented Nov 29, 2025

@copilot link checking is too greedy, does not resolve internal links correctly it seems. Also only output a warning instead of failing

Fixed in commit d86d5a9. Changes:

  • Simplified link checker to only check external URLs (http/https) - removed problematic internal link checking
  • Changed from ::error:: to ::warning:: annotations
  • Removed exit 1 so the build continues even when broken links are found

@maehr
Copy link
Contributor

maehr commented Nov 29, 2025

@copilot the ci workflow does not stop, fix that

Copy link
Author

Copilot AI commented Nov 29, 2025

@copilot the ci workflow does not stop, fix that

Fixed in commit 8e53ab3. Added timeouts to prevent the workflow from hanging:

  • Added timeout-minutes: 10 to the step for an overall 10-minute limit
  • Wrapped wget with timeout 15 to prevent hanging on unresponsive servers
  • Reduced wget timeout to 10s with 2 retries for faster checks
  • Added progress indicator every 10 links

@maehr
Copy link
Contributor

maehr commented Dec 4, 2025

@copilot broken link checker produces a lot of false positives. Try different strategies (agents etc.) to mitigate, especially with doi.org, before implementing

Copy link
Author

Copilot AI commented Dec 4, 2025

@copilot broken link checker produces a lot of false positives. Try different strategies (agents etc.) to mitigate, especially with doi.org, before implementing

Replaced wget with the lychee-action in commit b2e0150. This purpose-built link checker tool addresses false positives:

  • Uses browser-like User-Agent header to avoid blocks from doi.org and similar sites
  • Accepts 403/429 responses as valid (rate limiting doesn't mean broken)
  • Excludes known problematic social media sites (LinkedIn, X/Twitter)
  • Has built-in smart retry logic and proper redirect handling
  • Reports results in GitHub job summary for easy review

Copilot AI requested a review from maehr December 4, 2025 13:33
@maehr
Copy link
Contributor

maehr commented Dec 4, 2025

The job failed because the lycheeverse/lychee-action@v2 step tried to download lychee-arm64-unknown-linux-gnu.tar.gz but exited with code 56, indicating a download failure. This almost always means the specified binary for the runner platform (here, ARM64 for macOS) does not exist or is unsupported.

To fix this:

  • Change the runs-on property for the build-deploy job from macos-latest to ubuntu-latest or a supported x86_64 platform. Lychee releases provide compatible binaries for Linux x86_64 and Windows, but not macOS ARM runners. For example:
build-deploy:
  runs-on: ubuntu-latest
  • Alternatively, you can specify the LYCHEE_VERSION and LYCHEE_BIN environment variables to point at a supported binary if you must use macOS, but switching to ubuntu-latest is the simplest fix.

You can see the relevant section of your workflow here: .github/workflows/quarto-publish.yml (ref: b2e0150). Update the runs-on parameter as shown above to resolve the issue.

@copilot fix this

Copy link
Author

Copilot AI commented Dec 4, 2025

The job failed because the lycheeverse/lychee-action@v2 step tried to download lychee-arm64-unknown-linux-gnu.tar.gz but exited with code 56, indicating a download failure. This almost always means th...

Fixed in commit c4bdf07. Instead of changing the runner, I replaced lychee-action with a direct Homebrew installation of lychee, which supports macOS ARM. This keeps the macOS runner for typesetting requirements while still providing reliable link checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

404 on 445

3 participants