Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a9936f0
Fix: custom latency buckets #1376 (#1396)
amishra-u Jul 3, 2023
f74f270
Supply a basic retrier to remote cas writes
werkt Jul 6, 2023
f7eb74a
Update buildfarm-indexer for upstream updates
werkt Jul 12, 2023
f893b5a
Commonize grpc createChannel with TLS selection
werkt Jul 12, 2023
70903df
Refactor findMissingBlobs method
amishra-u Jun 29, 2023
68318c5
incorporate feedback
amishra-u Jul 6, 2023
95719e5
Upgrade grpc repo/maven deps for java_common
werkt Jul 16, 2023
abcd8fc
Shut down prometheus collector thread on CFC stop
werkt Jul 26, 2023
9b4bd6f
Output status code name on shard read error
werkt Jul 31, 2023
8710642
Reduce log levels to make log files more meaningful (#1414)
80degreeswest Aug 1, 2023
8315723
Fix io_bytes_read metrics for buildfarm:server
amishra-u Aug 4, 2023
77fee77
chore(deps): bump Guava from 31.1-jre to 32.1.1-jre
jasonschroeder-sfdc Jul 28, 2023
b5754e4
Upgrade opentelemetry javaagent to 1.28
tokongs Aug 8, 2023
9e7e633
Update platforms
keith Aug 10, 2023
ae907b5
Merge pull request #1422 from keith/ks/update-platforms
comius Aug 11, 2023
18ae72e
Add download rate metrics for buildfarm:worker (#1418)
amishra-u Aug 11, 2023
1180a85
Add request metadata interceptor to Worker (#1425)
amishra-u Aug 31, 2023
8795c6e
Separate channel for write api (#1424)
amishra-u Aug 31, 2023
0abb176
fix docu CAS config (#1432)
Stunkymonkey Sep 7, 2023
882e86f
Fix deadlock when handling Write request (#1442)
shirchen Sep 20, 2023
773341c
Deliver RemoteCasWriter IOExceptions (#1438)
werkt Sep 20, 2023
20d4ae5
build: update maven mirrors
jasonschroeder-sfdc Sep 6, 2023
a7f1ac2
Add production-ready helm chart for deploying buildfarm on k8s
lshmouse Aug 28, 2023
dfe9d3d
Update quick_start.md
werkt Sep 21, 2023
ed15779
Permit --prometheus_port to override config
werkt Sep 21, 2023
fb7cbd8
[metrics] Emit operation_exit_code metric to track execution exit cod…
80degreeswest Sep 22, 2023
a6fe207
feat(redis): support rediss:// URIs for Redis-SSL
jasonschroeder-sfdc Sep 7, 2023
79eee49
chore: make code more readable
jasonschroeder-sfdc Sep 20, 2023
3305669
Update rules_docker dependencies via injection
werkt Sep 23, 2023
7f86c47
Treat '.' working_directory segment as current
werkt Jul 18, 2023
90439ca
Remove unused setExecuteResponseBuilder
werkt Sep 25, 2023
8cc247f
Log on write errors
werkt Jul 17, 2023
f651cdb
Use integer ids for Sqlite bidirectional index
werkt Jul 24, 2023
9f93972
Update graceful shutdown functionality to better handle worker termin…
80degreeswest Sep 29, 2023
fe5fc19
Revert "Use integer ids for Sqlite bidirectional index"
80degreeswest Oct 7, 2023
b585478
Merge commit 'fe5fc198c4b29f363f00d57149f63f5bbb962fd8' into bazel-io…
chenj-hub Sep 16, 2024
ee77a49
Add back Autowired for Spring
chenj-hub Sep 16, 2024
032f18e
Merge pull request #9 from bazel-ios/jackies/upgrade-bazel-buildfarm-…
chenj-hub Sep 20, 2024
337053c
Point at different logging config
chenj-hub Oct 3, 2024
e286f1c
Revert "Upgrade grpc repo/maven deps for java_common"
chenj-hub Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions _site/docs/architecture/content_addressable_storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ This is the example presentation of a CAS in the memory instance available [here

```
worker:
cas:
type: MEMORY
maxSizeBytes: 2147483648 # 2 * 1024 * 1024 * 1024
storages:
- type: MEMORY
maxSizeBytes: 2147483648 # 2 * 1024 * 1024 * 1024
```

## GRPC
Expand All @@ -53,9 +53,11 @@ A grpc config example is available in the alternate instance specification in th
server:
name: shard
worker:
cas:
type: GRPC
target:
storages:
- type: FILESYSTEM
path: "cache"
- type: GRPC
target:
```

## HTTP/1
Expand Down Expand Up @@ -89,11 +91,10 @@ The CASFileCache is also available on MemoryInstance servers, where it can repre

```
worker:
cas:
type: FILESYSTEM
path: "cache"
maxSizeBytes: 2147483648 # 2 * 1024 * 1024 * 1024
maxEntrySizeBytes: 2147483648 # 2 * 1024 * 1024 * 1024
storages:
- type: FILESYSTEM
path: "cache"
maxSizeBytes: 2147483648 # 2 * 1024 * 1024 * 1024
```

CASTest is a standalone tool to load the cache and print status information about it.
62 changes: 32 additions & 30 deletions _site/docs/configuration/configuration.md

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions _site/docs/metrics/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,10 @@ Gauge for the number of operations in each stage (using a stage_name for each in

Gauge for the completed operations status (using a status_code label for each individual GRPC code)

**operation_exit_code**

Gauge for the completed operations exit code (using a exit_code label for each individual execution exit code)

**operation_worker**

Gauge for the number of operations executed on each worker (using a worker_name label for each individual worker)
Expand Down
48 changes: 32 additions & 16 deletions _site/docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Here we describe how to use bazel remote caching or remote execution with buildf

## Setup

You can run this quick start on a single computer running nearly any flavor of linux. This computer is the localhost for the rest of the description.
You can run this quick start on a single computer running any flavor of linux that bazel supports. A C++ compiler is used here to demonstrate action execution. This computer is the localhost for the rest of the description.

### Backplane

Expand Down Expand Up @@ -44,33 +44,43 @@ cc_binary(

And an empty WORKSPACE file.

As a test, verify that `bazel run :main` builds your main program and runs it, and prints `Hello, World!`. This will ensure that you have properly installed bazel and a C++ compiler, and have a working target before moving on to remote execution.
As a test, verify that `bazel run :main` builds your main program and runs it, and prints `Hello, World!`. This will ensure that you have properly installed `bazel` and a C++ compiler, and have a working target before moving on to remote caching or remote execution.

Download and extract the buildfarm repository. Each command sequence below will have the intended working directory indicated, between the client (workspace running bazel), and buildfarm.

This tutorial assumes that you have a bazel binary in your path and you are in the root of your buildfarm clone/release, and has been tested to work with bash on linux.

## Remote Caching

A Buildfarm server with an instance can be used strictly as an ActionCache and ContentAddressableStorage to improve build performance. This is an example of running a bazel client that will retrieve results if available, and store them if the cache is missed and the execution needs to run locally.
A Buildfarm cluster can be used strictly as an ActionCache (AC) and ContentAddressableStorage (CAS) to improve build performance. This is an example of running a bazel client that will retrieve results if available, otherwise store them on a cache miss after executing locally.

Download the buildfarm repository and change into its directory, then:

run `bazelisk run src/main/java/build/buildfarm:buildfarm-server $PWD/examples/config.minimal.yml`
* run `bazel run src/main/java/build/buildfarm:buildfarm-server $PWD/examples/config.minimal.yml`

This will wait while the server runs, indicating that it is ready for requests.

From another prompt (i.e. a separate terminal) in your newly created workspace directory from above:
A server alone does not itself store the content of action results. It acts as an endpoint for any number of workers that present storage, so we must also start a single worker.

run `bazel clean`
run `bazel run --remote_cache=grpc://localhost:8980 :main`
From another prompt (i.e. a separate terminal) in the buildfarm repository directory:

* run `bazel run src/main/java/build/buildfarm:buildfarm-shard-worker -- --prometheus_port=9091 $PWD/examples/config.minimal.yml`

The `--` option is bazel convention to treat all subsequent arguments as parameters to the running app, like our `--prometheus_port`, instead of interpreting them with `run`
The `--prometheus_port=9091` option allows this worker to run alongside our server, who will have started and logged that it has started a service on port `9090`. You can also turn this option off (with `--` separator), with `--prometheus_option=0` for either server or worker.
This will also wait while the worker runs, indicating it will be available to store cache content.

From another prompt in your newly created workspace directory from above:

* run `bazel clean`
* run `bazel run --remote_cache=grpc://localhost:8980 :main`

Why do we clean here? Since we're verifying re-execution and caching, this ensures that we will execute any actions in the `run` step and interact with the remote cache. We should be attempting to retrieve cached results, and then when we miss - since we just started this memory resident server - bazel will upload the results of the execution for later use. There will be no change in the output of this bazel run if everything worked, since bazel does not provide output each time it uploads results.

To prove that we have placed something in the action cache, we need to do the following:

run `bazel clean`
run `bazel run --remote_cache=localhost:8980 :main`
* run `bazel clean`
* run `bazel run --remote_cache=localhost:8980 :main`

This should now print statistics on the `processes` line that indicate that you've retrieved results from the cache for your actions:

Expand All @@ -80,20 +90,22 @@ INFO: 2 processes: 2 remote cache hit.

## Remote Execution (and caching)

Now we will use buildfarm for remote execution with a minimal configuration - a single memory instance, with a worker on the localhost that can execute a single process at a time - via a bazel invocation on our workspace.
Now we will use buildfarm for remote execution with a minimal configuration with a worker on the localhost that can execute a single process at a time, via a bazel invocation on our workspace.

First, we should restart the buildfarm server to ensure that we get remote execution (this can also be forced from the client by using `--noremote_accept_cached`). From the buildfarm server prompt and directory:
First, to clean out the results from the previous cached actions, flush your local redis database:

interrupt a running `buildfarm-server`
run `bazelisk run src/main/java/build/buildfarm:buildfarm-server $PWD/examples/config.minimal.yml`
* run `redis-cli flushdb`

From another prompt in the buildfarm repository directory:
Next, we should restart the buildfarm server, and delete the worker's cas storage to ensure that we get remote execution (this can also be forced from the client by using `--noremote_accept_cached`). From the buildfarm server prompt and directory:

run `bazelisk run src/main/java/build/buildfarm:buildfarm-shard-worker $PWD/examples/config.minimal.yml`
* interrupt the running `buildfarm-server` (i.e. Ctrl-C)
* run `bazel run src/main/java/build/buildfarm:buildfarm-server $PWD/examples/config.minimal.yml`

You can leave the worker running from the Remote Caching step, it will not require a restart

From another prompt, in your client workspace:

run `bazel run --remote_executor=grpc://localhost:8980 :main`
* run `bazel run --remote_executor=grpc://localhost:8980 :main`

Your build should now print out the following on its `processes` summary line:

Expand All @@ -117,6 +129,10 @@ To stop the containers, run:
./examples/bf-run stop
```

## Next Steps

We've started our worker on the same host as our server, and also the same host on which we built with bazel, but these services can be spread across many machines, per 'remote'. A large number of workers, with a relatively small number of servers (10:1 and 100:1 ratios have been used in practice), consolidating large disks and beefy multicore cpus/gpus on workers, with specialization of what work they perform for bazel builds (or other client work), and specializing servers to have hefty network connections to funnel content traffic. A buildfarm deployment can service hundreds or thousands of developers or CI processes, enabling them to benefit from each others' shared context in the AC/CAS, and the pooled execution of a fleet of worker hosts eager to consume operations and deliver results.

## Buildfarm Manager

You can now easily launch a new Buildfarm cluster locally or in AWS using an open sourced [Buildfarm Manager](https://github.com/80degreeswest/bfmgr).
Expand Down
6 changes: 3 additions & 3 deletions defs.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def buildfarm_init(name = "buildfarm"):
"com.google.errorprone:error_prone_annotations:2.9.0",
"com.google.errorprone:error_prone_core:0.92",
"com.google.guava:failureaccess:1.0.1",
"com.google.guava:guava:31.1-jre",
"com.google.guava:guava:32.1.1-jre",
"com.google.j2objc:j2objc-annotations:1.1",
"com.google.jimfs:jimfs:1.1",
"com.google.protobuf:protobuf-java-util:3.10.0",
Expand Down Expand Up @@ -139,8 +139,8 @@ def buildfarm_init(name = "buildfarm"):
],
generate_compat_repositories = True,
repositories = [
"https://repo.maven.apache.org/maven2",
"https://jcenter.bintray.com",
"https://repo1.maven.org/maven2",
"https://mirrors.ibiblio.org/pub/mirrors/maven2",
],
)

Expand Down
29 changes: 24 additions & 5 deletions deps.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ def archive_dependencies(third_party):
{
"name": "platforms",
"urls": [
"https://mirror.bazel.build/github.com/bazelbuild/platforms/releases/download/0.0.6/platforms-0.0.6.tar.gz",
"https://github.com/bazelbuild/platforms/releases/download/0.0.6/platforms-0.0.6.tar.gz",
"https://mirror.bazel.build/github.com/bazelbuild/platforms/releases/download/0.0.7/platforms-0.0.7.tar.gz",
"https://github.com/bazelbuild/platforms/releases/download/0.0.7/platforms-0.0.7.tar.gz",
],
"sha256": "5308fc1d8865406a49427ba24a9ab53087f17f5266a7aabbfc28823f3916e1ca",
"sha256": "3a561c99e7bdbe9173aa653fd579fe849f1d8d67395780ab4770b1f381431d51",
},
{
"name": "rules_jvm_external",
Expand Down Expand Up @@ -111,10 +111,29 @@ def archive_dependencies(third_party):
"patch_args": ["-p1"],
"patches": ["%s:clang_toolchain.patch" % third_party],
},

# Used to build release container images
{
"name": "io_bazel_rules_docker",
"sha256": "b1e80761a8a8243d03ebca8845e9cc1ba6c82ce7c5179ce2b295cd36f7e394bf",
"urls": ["https://github.com/bazelbuild/rules_docker/releases/download/v0.25.0/rules_docker-v0.25.0.tar.gz"],
"patch_args": ["-p0"],
"patches": ["%s:docker_go_toolchain.patch" % third_party],
},

# Updated versions of io_bazel_rules_docker dependencies for bazel compatibility
{
"name": "io_bazel_rules_go",
"sha256": "278b7ff5a826f3dc10f04feaf0b70d48b68748ccd512d7f98bf442077f043fe3",
"urls": [
"https://mirror.bazel.build/github.com/bazelbuild/rules_go/releases/download/v0.41.0/rules_go-v0.41.0.zip",
"https://github.com/bazelbuild/rules_go/releases/download/v0.41.0/rules_go-v0.41.0.zip",
],
},
{
"name": "bazel_gazelle",
"sha256": "d3fa66a39028e97d76f9e2db8f1b0c11c099e8e01bf363a923074784e451f809",
"urls": ["https://github.com/bazelbuild/bazel-gazelle/releases/download/v0.33.0/bazel-gazelle-v0.33.0.tar.gz"],
},

# Bazel is referenced as a dependency so that buildfarm can access the linux-sandbox as a potential execution wrapper.
Expand Down Expand Up @@ -188,9 +207,9 @@ def buildfarm_dependencies(repository_name = "build_buildfarm"):
maybe(
http_jar,
"opentelemetry",
sha256 = "0523287984978c091be0d22a5c61f0bce8267eeafbbae58c98abaf99c9396832",
sha256 = "eccd069da36031667e5698705a6838d173d527a5affce6cc514a14da9dbf57d7",
urls = [
"https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.11.0/opentelemetry-javaagent.jar",
"https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.28.0/opentelemetry-javaagent.jar",
],
)

Expand Down
2 changes: 1 addition & 1 deletion examples/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ server:
admin:
deploymentEnvironment: AWS
clusterEndpoint: "grpc://localhost"
enableGracefulShutdown: false
metrics:
publisher: LOG
logLevel: FINEST
Expand Down Expand Up @@ -126,6 +125,7 @@ worker:
onlyMulticoreTests: false
allowBringYourOwnContainer: false
errorOperationRemainingResources: false
gracefulShutdownSeconds: 0
sandboxSettings:
alwaysUse: false
selectForBlockNetwork: false
Expand Down
2 changes: 1 addition & 1 deletion jvm_flags.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ RECOMMENDED_JVM_FLAGS = [
"-XX:+HeapDumpOnOutOfMemoryError",
]

DEFAULT_LOGGING_CONFIG = ["-Dlogging.config=file:/app/build_buildfarm/src/main/java/build/buildfarm/logging.properties"]
DEFAULT_LOGGING_CONFIG = ["-Dlogging.config=file:/etc/bazel-re/logging.properties"]

def ensure_accurate_metadata():
return select({
Expand Down
2 changes: 2 additions & 0 deletions kubernetes/helm-charts/buildfarm/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
charts
Chart.lock
23 changes: 23 additions & 0 deletions kubernetes/helm-charts/buildfarm/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
30 changes: 30 additions & 0 deletions kubernetes/helm-charts/buildfarm/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: v2
name: buildfarm
description: A Helm chart for bazel buildfarm

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "v2.5.0"

dependencies:
- condition: redis.enabled
name: redis
repository: https://charts.helm.sh/stable
version: 10.5.7
22 changes: 22 additions & 0 deletions kubernetes/helm-charts/buildfarm/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
1. Get the application URL by running these commands:
{{- if .Values.server.ingress.enabled }}
{{- range $host := .Values.server.ingress.hosts }}
{{- range .paths }}
http{{ if $.Values.server.ingress.tls }}s{{ end }}://{{ $host.host }}{{ .path }}
{{- end }}
{{- end }}
{{- else if contains "NodePort" .Values.server.service.type }}
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "buildfarm.fullname" . }})
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
{{- else if contains "LoadBalancer" .Values.server.service.type }}
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "buildfarm.fullname" . }}'
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "buildfarm.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}")
echo http://$SERVICE_IP:{{ .Values.server.service.port }}
{{- else if contains "ClusterIP" .Values.server.service.type }}
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "buildfarm.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT
{{- end }}
Loading