Open
Conversation
Replace renv_parallel_exec (mclapply, Unix-only) in renv_graph_install and renv_graph_download with cross-platform parallel downloads. Resolve download URLs upfront via renv_graph_urls(), then fetch them in a single parallel transfer using renv_download_parallel(). Two parallel backends, gated on the configured download method: - external curl >= 7.66.0: multi-section config file + --parallel - R >= 4.5.0 libcurl: vectorized download.file(method = "libcurl") Falls back to sequential downloads when neither backend is available. Packages from unsupported sources (bitbucket, git, local) also fall back to the existing renv_retrieve_impl_one() path.
Use case() helper in renv_download_parallel_method() and add tests covering method selection, curl config text generation, curl --parallel backend, libcurl vectorized backend, sequential fallback, and failure reporting. Each backend test skips when its method is unavailable.
Add renv_graph_url_url() to resolve URL source records, and renv_graph_url_local() to convert local file paths to file:// URIs so all three parallel backends can handle them uniformly. Bitbucket remains in the sequential fallback since it requires an API roundtrip to resolve the tarball URL.
Delegate binary vs source selection to renv_available_packages_latest, which handles P3M, NeedsCompilation, toolchain availability, and all pkgType heuristics. Fall back to version-filtered lookup when the latest doesn't match, and to the CRAN archive for old versions.
Test renv_curl_version, renv_graph_url_* helpers (github, gitlab, url, local, repository), renv_graph_urls dispatcher, and renv_graph_download end-to-end through the parallel path.
Replace the sequential renv_retrieve_impl + renv_install_impl pipeline with renv_graph_init + renv_graph_install for dependency-wave-ordered parallel downloads and installation. Key changes: - install() now builds a dependency graph via renv_graph_init, then downloads and installs in wave order via renv_graph_install - renv_graph_init accepts pre-resolved records from install()'s remote resolution, with two-phase BFS: top-level remotes use extended dependency fields (e.g. Suggests), transitive deps use defaults - Bioconductor repos are activated once after phase 1 if any resolved description has Source=Bioconductor or biocViews - Cellar packages are resolved via renv_available_packages_latest fallback in renv_graph_description_repository - Failed resolutions are tracked via resolution_failed attribute to properly error on packages that can't be resolved even when an older version is already installed - Non-standard dependency fields (e.g. Config/Needs/protein) are supplemented from installed package DESCRIPTIONs since PACKAGES metadata only includes standard fields - Repository and cellar records are properly tagged for renv_package_augment to write RemoteRepos/RemoteReposName
Replace warningf in renv_graph_resolve with silent tracking via resolution_error attribute. The warning was noise for packages that still install successfully from the archive (e.g. today@0.1.0) and leaked through testthat's expect_error. The error reason is now included in the install failure message instead. Also simplify renv_graph_description_local to delegate to renv_description_read, which already handles directories, archives, and DESCRIPTION files.
- extract renv_graph_adjacency() shared by sort and waves - extract renv_graph_scope_retrieve() shared by download and install - simplify source install loop (remove unused jobs batching) - report download summary once at end instead of per-package elapsed - use index pointer for BFS queue to avoid O(n) dequeue - add debug logging for libcurl parallel download errors
Move all downloading before the wave loop so every package is fetched in one parallel batch. The wave loop now only installs from local files. Add config$install.jobs (default 4) to control how many R CMD INSTALL processes run concurrently within each wave.
Install all binaries up front (before wave loop) since they're just file copies with no build-time dependency ordering. This collapses waves when binary deps are already satisfied. Remove dead renv_graph_download function and its tests. Wire restore() into the graph-based installer via renv_graph_init + renv_graph_install, replacing the old renv_retrieve_impl + renv_install_impl sequential path. Fix install.jobs config description (default is 4, not 1).
use %{stderr} in --write-out to route output to stderr (unbuffered
by the C standard), avoiding the full buffering libc applies to
stdout when piped. shell redirection (2>&1 >/dev/null) swaps the
descriptors so the pipe carries the write-out data. --next separates
per-URL option groups; --parallel/--parallel-max are global and
survive the resets.
use a single config file with 'next' between URL sections instead of building --next on the command line per-URL, avoiding the 8191 char cmd.exe limit on Windows. per-URL options (ssl-revoke, user curl configs) are included in each section via config file directives. add --parallel-immediate when curl >= 7.68.0 to start all connections at once rather than waiting for the first to establish.
Replace the sequential renv_retrieve_impl() + renv_install_impl() path in renv_hydrate_resolve_missing() with renv_graph_init() + renv_graph_install(), giving hydrate parallel downloads and wave-ordered installs. Error reporting is handled by renv_graph_install_errors().
Filter base packages (utils, tools, etc.) in renv_graph_resolve() so they don't cause "package not available" errors when install() is called without arguments. Scope bioconductor repo activation to the caller's frame via a new scope parameter on renv_graph_init(), so bioc repos persist into renv_graph_install() for download URL resolution.
The pre-flight report listed all transitive dependencies from graph resolution, including packages already installed at the correct version. Filter with renv_restore_find() before reporting so only packages that will actually be installed are shown.
On R >= 4.0, child R processes now report install results back over a socket instead of through pipes. The parent collects results via socketAccept() in completion order, so progress lines appear as each package finishes rather than blocking on the slowest in the batch. Falls back to the existing pipe-based sequential collection on R < 4.0.
socketAccept() blocks indefinitely on many platforms regardless of the timeout parameter. Poll with socketSelect() first, which reliably respects the timeout on server sockets, then accept only once a connection is known to be waiting.
Large packages like rstan or arrow can take well over 10 minutes to compile from source on slow hardware. A 1-hour timeout avoids misclassifying long compiles as crashes.
Replace the wave+batch two-level loop (Phase 2b) with a single
event loop that launches source packages as soon as their
dependencies complete — live Kahn's algorithm. This keeps all
worker slots busy whenever eligible work exists, rather than
waiting for the slowest package in each wave.
Key changes:
- Lift handle() and callbacks above the R version branch so
both R >= 4.0 and R < 4.0 paths share the same closure
- R >= 4.0: single server socket for the entire install with
a ready-queue event loop driven by indegree tracking
- R < 4.0: wave-based fallback preserved, moved into else branch
- Extract renv_graph_install_accept() and
renv_graph_install_backup() helpers to reduce duplication
- Use stack("character") for failed to avoid repeated vector
reallocation and eliminate <<- in handle()
This reverts commit 2ac48aa.
- child workers now connect to the parent socket immediately on startup, before running R CMD INSTALL, and send a handshake - the parent polls all accepted worker connections alongside the server socket in a single socketSelect call - broken TCP connections give immediate crash detection instead of the previous 1-hour timeout - temp install scripts are cleaned up after the handshake - add deadline guard to the download socket loop - fix endsWith backport parameter name (prefix -> suffix)
- defer cleanup of accepted worker connections in the event loop so they are closed on interrupt or error - increase child socketConnection timeout in the test from 10s to 60s; the parent accepts one connection per poll iteration, so with 3 children and 3s poll timeouts the third child's connect could race against a short timeout under parallel test load
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
R CMD INSTALLprocesses.install()andrestore()into the newrenv_graph_init()+renv_graph_install()pipeline.config$install.jobs(default 4) to control the number of concurrent source package builds per wave.--parallel(curl >= 7.66.0) or R's vectorizeddownload.file(method = "libcurl")(R >= 4.5.0), with sequential fallback.R/graphviz.R;R/graph.Rnow contains the dependency resolution, download, and install machinery.Test plan
NOT_CRAN=true Rscript -e 'devtools::test(filter = "install|graph|restore|snapshot")'passes (236 pass, 0 fail)install()uses parallel downloads and wave-based installsrestore()uses the same graph-based pipelineoptions(renv.config.install.jobs = 1L)for sequential fallback