Skip to content

revdeprun blog #7665

@tdhock

Description

@tdhock

hi @nanxstats
I read https://nanx.me/blog/post/revdeprun-2-1-0/

Can you please comment to say if my interpretation is correct?
and please answer the questions below?

Your code runs on a single machine with lots of cores (256 in that example).
The time, 2h 44min, is impressive. This is much faster than 60 days CPU time, and ~10 hours wall time, from a recent report (https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/2026-03-10/).

Your cost was $24, our cost is free (Monsoon, the super-computer running SLURM, free to use for NAU-affiliated researchers like me, but this is not scalable to the entire R community).

There are parallel downloads, max 256 downloads at a time.
do the packages come from bioconductor too? or just CRAN?
how do you resolve system dependencies? (C libraries to install before R pkg install)
when you checked data.table revdeps, were there any packages that failed to install?

We are not allowed to do parallel download, so every day we rsync all of CRAN source packages to the Monsoon file system (sequential download but fast), and then set options(repos="file:///path/to/local/CRAN") to install packages from our local copy of CRAN.

After building R-release and R-devel from source, we launch 1700+ parallel checking jobs (one for each CRAN revdep), each of which runs in 1 CPU, potentially on separate machines, running the following for both versions of R:

  • installs deps of revdep to a job-specific library. (lots of redundant installs across the 1700+ parallel checking jobs)
  • checks revdep with data.table from CRAN and github master. this is important to avoid false positives (issues shown in report that are not real issues we need to fix).
  • if there are any differences in check results, run git bisect to determine when it started. (very helpful to determine the fix)
  • save results to network file system.

currently we don’t install any bioconductor packages (only CRAN).
some system dependencies are installed from modules (easy), others from conda (medium), a few from manual configure/make/etc (hard).
there are many packages that fail to install, which may result in false negatives in the final report (no issue shown, but issue does exist).

After checking each revdep in parallel, the results are read from the file system and converted to a web page report, for display on https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze

https://nanx.me/blog/post/revdeprun-2-1-0/ does not mention the methodology used for computing the final results.
https://nanx.me/blog/post/revdeprun/ mentions three packages (revdepcheck from Gábor Csárdi, crandalf and xfun::rev_check() from Yihui Xie, revdepcheck.extras from Henrik Bengtsson). does you code run one of these / give consistent output to one of these? do you have an example result report to share?
does it look at diffs between github and CRAN checks? does it support multiple R versions? does it run git bisect?

In our system we do all of that, and report results on a web page, with a table, one row per revdep check difference (between data.table github and CRAN). https://github.com/Rdatatable/data.table/wiki/Revdep-checks#significant-differences-table
Is there some way you system could provide a similar output, so we could check to make sure they are consistent with our results?

Overall, I think running your code would be a good double-check to run "once in a while" (before each release to CRAN?) in order to catch potential blind spots (false negatives) that the Monsoon system would miss due to install failures.
Could we please ask you to do that for us before the next release?
data.table is maintained by volunteers, and we could definitely use some help from a talented coder such as yourself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions