|
| 1 | +# Open(FL)ower |
| 2 | + |
| 3 | +This workspace demonstrates a new functionality in OpenFL to interoperate with [Flower](https://flower.ai/). In particular, a user can now use the Flower API to run on OpenFL infrastructure. OpenFL will act as an intermediary step between the Flower SuperLink and Flower SuperNode to relay messages across the network using OpenFL's transport mechanisms. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +In this repository, you'll notice a directory under `src` called `app-pytorch`. This is essentially a Flower PyTorch app created using Flower's `flwr new` command that has been modified to run a local federation. The `client_app.py` and `server_app.py` dictate what will be run by the client and server respectively. `task.py` defines the logic that will be executed by each app, such as the model definition, train/test tasks, etc. Under `server_app.py` a section titled "Save Model" is added in order to save the `best.pbuf` and `last.pbuf` models from the experiment in your local workspace under `./save`. This uses native OpenFL logic to store the model as a `.pbuf` in order to later be retrieved by `fx model save` into a native format (limited to `.npz` to be deep learning framework agnostic), but this can be overridden to save the model directly following Flower's recommended method for [saving model checkpoints](https://flower.ai/docs/framework/how-to-save-and-load-model-checkpoints.html). |
| 8 | + |
| 9 | +## Getting Started |
| 10 | + |
| 11 | +### Install OpenFL |
| 12 | + |
| 13 | +Follow the [installation guide](https://openfl.readthedocs.io/en/latest/installation.html). |
| 14 | + |
| 15 | +### Create a Workspace |
| 16 | + |
| 17 | +Start by creating a workspace: |
| 18 | + |
| 19 | +```sh |
| 20 | +fx workspace create --template flower-app-pytorch --prefix my_workspace |
| 21 | +cd my_workspace |
| 22 | +``` |
| 23 | + |
| 24 | +This will create a workspace in your current working directory called `./my_workspace` as well as install the Flower app defined in `./app-pytorch.` This will be where the experiment takes place. |
| 25 | + |
| 26 | +### Configure the Experiment |
| 27 | +Notice under `./plan`, you will find the familiar OpenFL YAML files to configure the experiment. `cols.yaml` and `data.yaml` will be populated by the collaborators that will run the Flower client app and the respective data shard or directory they will perform their training and testing on. |
| 28 | +`plan.yaml` configures the experiment itself. The Open-Flower integration makes a few key changes to the `plan.yaml`: |
| 29 | + |
| 30 | +1. Introduction of a new top-level key (`connector`) to configure a newly introduced component called `ConnectorFlower`. This component is run by the aggregator and is responsible for initializing the Flower `SuperLink` and connecting to the OpenFL server. The `SuperLink` parameters can be configured using `connector.settings.superlink_params`. If nothing is supplied, it will simply run `flower-superlink --insecure` with the command's default settings as dictated by Flower. It also includes the option to run the flwr run command via `connector.settings.flwr_run_params`. If `flwr_run_params` are not provided, the user will be expected to run `flwr run <app>` from the aggregator machine to initiate the experiment. |
| 31 | + |
| 32 | +```yaml |
| 33 | +connector: |
| 34 | + defaults: plan/defaults/connector.yaml |
| 35 | + template: openfl.component.ConnectorFlower |
| 36 | + settings: |
| 37 | + superlink_params: |
| 38 | + insecure: True |
| 39 | + serverappio-api-address: 127.0.0.1:9091 |
| 40 | + fleet-api-address: 127.0.0.1:9092 |
| 41 | + exec-api-address: 127.0.0.1:9093 |
| 42 | + flwr_run_params: |
| 43 | + flwr_app_name: "app-pytorch" |
| 44 | + federation_name: "local-poc" |
| 45 | +``` |
| 46 | +
|
| 47 | +2. `FlowerTaskRunner` which will execute the `start_client_adapter` task. This task starts the Flower SuperNode and makes a connection to the OpenFL client. Additionally, the `FlowerTaskRunner` has an additional setting `FlowerTaskRunner.settings.auto_shutdown` which is default set to `True`. When set to `True`, the task runner will shut the SuperNode at the completion of an experiment, otherwise, it will run continuously. |
| 48 | + |
| 49 | +```yaml |
| 50 | +task_runner: |
| 51 | + defaults: plan/defaults/task_runner.yaml |
| 52 | + template: openfl.federated.task.runner_flower.FlowerTaskRunner |
| 53 | + settings: |
| 54 | + auto_shutdown: True |
| 55 | +``` |
| 56 | + |
| 57 | +3. `FlowerDataLoader` with similar high-level functionality to other dataloaders. |
| 58 | + |
| 59 | +**IMPORTANT NOTE**: `aggregator.settings.rounds_to_train` is set to 1. __Do not edit this__. The actual number of rounds for the experiment is controlled by Flower logic inside of `./app-pytorch/pyproject.toml`. The entirety of the Flower experiment will run in a single OpenFL round. Increasing this will cause OpenFL to attempt to run the experiment again. The aggregator round is there to stop the OpenFL components at the completion of the experiment. |
| 60 | + |
| 61 | +4. `Task` - we introduce a `tasks_connector.yaml` that will allow the collaborator to connect to Flower framework via the local gRPC server. It also handles the task runner's `start_client_adapter` method, which actually starts the Flower component and local gRPC server. By setting `local_server_port` to 0, the port is dynamically allocated. This is mainly for local experiments to avoid overlapping the ports. |
| 62 | + |
| 63 | +```yaml |
| 64 | +tasks: |
| 65 | + settings: |
| 66 | + connect_to: Flower |
| 67 | + start_client_adapter: |
| 68 | + function: start_client_adapter |
| 69 | + kwargs: |
| 70 | + local_server_port: 0 |
| 71 | +``` |
| 72 | + |
| 73 | +## Execution Methods |
| 74 | +There are two ways to execute this: |
| 75 | + |
| 76 | +1. Automatic shutdown which will spawn a `server-app` in isolation and trigger an experiment termination once the it shuts down. (Default/Recommended) |
| 77 | +2. Running `SuperLink` and `SuperNode` as [long-lived components](#long-lived-superlink-and-supernode) that will indefinitely wait for new runs. (Limited Functionality) |
| 78 | + |
| 79 | +## Running the Workspace |
| 80 | +We proceed with the automatic shutdown method of execution. |
| 81 | +Run the workspace as normal (certify the workspace, initialize the plan, register the collaborators, etc.): |
| 82 | + |
| 83 | +```SH |
| 84 | +# Generate a Certificate Signing Request (CSR) for the Aggregator |
| 85 | +fx aggregator generate-cert-request |
| 86 | +
|
| 87 | +# The CA signs the aggregator's request, which is now available in the workspace |
| 88 | +fx aggregator certify --silent |
| 89 | +
|
| 90 | +# Initialize FL Plan and Model Weights for the Federation |
| 91 | +fx plan initialize |
| 92 | +
|
| 93 | +################################ |
| 94 | +# Setup Collaborator 1 |
| 95 | +################################ |
| 96 | +
|
| 97 | +# Create a collaborator named "collaborator1" that will use shard "0" |
| 98 | +fx collaborator create -n collaborator1 -d data/1 |
| 99 | +
|
| 100 | +# Generate a CSR for collaborator1 |
| 101 | +fx collaborator generate-cert-request -n collaborator1 |
| 102 | +
|
| 103 | +# The CA signs collaborator1's certificate |
| 104 | +fx collaborator certify -n collaborator1 --silent |
| 105 | +
|
| 106 | +################################ |
| 107 | +# Setup Collaborator 2 |
| 108 | +################################ |
| 109 | +
|
| 110 | +# Create a collaborator named "collaborator2" that will use shard "1" |
| 111 | +fx collaborator create -n collaborator2 -d data/2 |
| 112 | +
|
| 113 | +# Generate a CSR for collaborator2 |
| 114 | +fx collaborator generate-cert-request -n collaborator2 |
| 115 | +
|
| 116 | +# The CA signs collaborator2's certificate |
| 117 | +fx collaborator certify -n collaborator2 --silent |
| 118 | +
|
| 119 | +############################## |
| 120 | +# Start to Run the Federation |
| 121 | +############################## |
| 122 | +
|
| 123 | +# Run the Aggregator |
| 124 | +fx aggregator start |
| 125 | +``` |
| 126 | + |
| 127 | +This will prepare the workspace and start the OpenFL aggregator, Flower superlink, and Flower serverapp. You should see something like: |
| 128 | + |
| 129 | +```SH |
| 130 | +INFO 🧿 Starting the Aggregator Service. |
| 131 | +. |
| 132 | +. |
| 133 | +. |
| 134 | +INFO : Starting Flower SuperLink |
| 135 | +WARNING : Option `--insecure` was set. Starting insecure HTTP server. |
| 136 | +INFO : Flower Deployment Engine: Starting Exec API on 127.0.0.1:9093 |
| 137 | +INFO : Flower ECE: Starting ServerAppIo API (gRPC-rere) on 127.0.0.1:9091 |
| 138 | +INFO : Flower ECE: Starting Fleet API (GrpcAdapter) on 127.0.0.1:9092 |
| 139 | +. |
| 140 | +. |
| 141 | +. |
| 142 | +INFO : [INIT] |
| 143 | +INFO : Using initial global parameters provided by strategy |
| 144 | +INFO : Starting evaluation of initial global parameters |
| 145 | +INFO : Evaluation returned no results (`None`) |
| 146 | +INFO : |
| 147 | +INFO : [ROUND 1] |
| 148 | +``` |
| 149 | +
|
| 150 | +### Start Collaborators |
| 151 | +Open 2 additional terminals for collaborators. |
| 152 | +For collaborator 1's terminal, run: |
| 153 | +```SH |
| 154 | +fx collaborator start -n collaborator1 |
| 155 | +``` |
| 156 | +For collaborator 2's terminal, run: |
| 157 | +```SH |
| 158 | +fx collaborator start -n collaborator2 |
| 159 | +``` |
| 160 | +This will start the collaborator nodes, the Flower `SuperNode`, and Flower `ClientApp`, and begin running the Flower experiment. You should see something like: |
| 161 | + |
| 162 | +```SH |
| 163 | + INFO 🧿 Starting a Collaborator Service. |
| 164 | +. |
| 165 | +. |
| 166 | +. |
| 167 | +INFO : Starting Flower SuperNode |
| 168 | +WARNING : Option `--insecure` was set. Starting insecure HTTP channel to 127.0.0.1:... |
| 169 | +INFO : Starting Flower ClientAppIo gRPC server on 127.0.0.1:... |
| 170 | +INFO : |
| 171 | +INFO : [RUN 297994661073077505, ROUND 1] |
| 172 | +``` |
| 173 | +### Completion of the Experiment |
| 174 | +Upon the completion of the experiment, on the `aggregator` terminal, the Flower components should send an experiment summary as the `SuperLink `continues to receive requests from the supernode: |
| 175 | +```SH |
| 176 | +INFO : [SUMMARY] |
| 177 | +INFO : Run finished 3 round(s) in 93.29s |
| 178 | +INFO : History (loss, distributed): |
| 179 | +INFO : round 1: 2.0937052175497555 |
| 180 | +INFO : round 2: 1.8027011854633406 |
| 181 | +INFO : round 3: 1.6812996898487116 |
| 182 | +``` |
| 183 | +If `automatic_shutdown` is enabled, this will be shortly followed by the OpenFL `aggregator` receiving "results" from the `collaborator` and subsequently shutting down: |
| 184 | + |
| 185 | +```SH |
| 186 | +INFO Round 0: Collaborators that have completed all tasks: ['collaborator1', 'collaborator2'] |
| 187 | +INFO Experiment Completed. Cleaning up... |
| 188 | +INFO Sending signal to collaborator collaborator2 to shutdown... |
| 189 | +INFO Sending signal to collaborator collaborator1 to shutdown... |
| 190 | +INFO [OpenFL Connector] Stopping server process with PID: ... |
| 191 | +INFO : SuperLink terminated gracefully. |
| 192 | +INFO [OpenFL Connector] Server process stopped. |
| 193 | +``` |
| 194 | +Upon the completion of the experiment, on the `collaborator` terminals, the Flower components should be outputting the information about the run: |
| 195 | + |
| 196 | +```SH |
| 197 | +INFO : [RUN ..., ROUND 3] |
| 198 | +INFO : Received: evaluate message |
| 199 | +INFO : Start `flwr-clientapp` process |
| 200 | +INFO : [flwr-clientapp] Pull `ClientAppInputs` for token ... |
| 201 | +INFO : [flwr-clientapp] Push `ClientAppOutputs` for token ... |
| 202 | +``` |
| 203 | + |
| 204 | +If `automatic_shutdown` is enabled, this will be shortly followed by the OpenFL `collaborator` shutting down: |
| 205 | + |
| 206 | +```SH |
| 207 | +INFO : SuperNode terminated gracefully. |
| 208 | +INFO SuperNode process terminated. |
| 209 | +INFO Shutting down local gRPC server... |
| 210 | +INFO local gRPC server stopped. |
| 211 | +INFO Waiting for tasks... |
| 212 | +INFO Received shutdown signal. Exiting... |
| 213 | +``` |
| 214 | +Congratulations, you have run a Flower experiment through OpenFL's task runner! |
| 215 | + |
| 216 | +## Advanced Usage |
| 217 | +### Long-lived SuperLink and SuperNode |
| 218 | +A user can set `automatic_shutdown: False` in the `Connector` settings of the `plan.yaml`. |
| 219 | + |
| 220 | +```yaml |
| 221 | +connector : |
| 222 | + defaults : plan/defaults/connector.yaml |
| 223 | + template : openfl.component.ConnectorFlower |
| 224 | + settings : |
| 225 | + automatic_shutdown: False |
| 226 | +``` |
| 227 | +
|
| 228 | +By doing so, Flower's `ServerApp` and `ClientApp` will still shut down at the completion of the Flower experiment, but the `SuperLink` and `SuperNode` will continue to run. As a result, on the `aggregator` terminal, you will see a constant request coming from the `SuperNode`: |
| 229 | + |
| 230 | +```SH |
| 231 | +INFO : GrpcAdapter.PullTaskIns |
| 232 | +INFO : GrpcAdapter.PullTaskIns |
| 233 | +INFO : GrpcAdapter.PullTaskIns |
| 234 | +``` |
| 235 | +You can run another experiment by opening another terminal, navigating to this workspace, and running: |
| 236 | +```SH |
| 237 | +flwr run ./src/app-pytorch |
| 238 | +``` |
| 239 | +It will run another experiment. Once you are done, you can manually shut down OpenFL's `collaborator` and Flower's `SuperNode` with `CTRL+C`. This will trigger a task-completion by the task runner that'll subsequently begin the graceful shutdown process of the OpenFL and Flower components. |
| 240 | + |
| 241 | +### Running in SGX Enclave |
| 242 | +Gramine does not support all Linux system calls. Flower FAB is built and installed at runtime. During this, `utime()` is called, which is an [unsupported call](https://gramine.readthedocs.io/en/latest/devel/features.html#list-of-system-calls), resulting in error or unexpected behavior. To navigate this, when running in an SGX enclave, we opt to build and install the FAB during initialization and package it alongside the OpenFL workspace. To make this work, we introduce some patches to Flower's build command, which helps circumvent the unsupported system call as well as minimize read/write access. |
| 243 | + |
| 244 | +To run these patches, simply add `patch: True` to the `Connector` and `Task Runner` settings. For the `Task Runner` also include the name of the Flower app for building and installation. |
| 245 | + |
| 246 | +```yaml |
| 247 | +connector : |
| 248 | + defaults : plan/defaults/connector.yaml |
| 249 | + template : openfl.component.ConnectorFlower |
| 250 | + settings : |
| 251 | + superlink_params : |
| 252 | + insecure : True |
| 253 | + serverappio-api-address : 127.0.0.1:9091 |
| 254 | + fleet-api-address : 127.0.0.1:9092 |
| 255 | + exec-api-address : 127.0.0.1:9093 |
| 256 | + flwr_run_params : |
| 257 | + flwr_app_name : "app-pytorch" |
| 258 | + federation_name : "local-poc" |
| 259 | + patch : True |
| 260 | +
|
| 261 | +task_runner : |
| 262 | + defaults : plan/defaults/task_runner.yaml |
| 263 | + template : openfl.federated.task.runner_flower.FlowerTaskRunner |
| 264 | + settings : |
| 265 | + patch : True |
| 266 | + flwr_app_name : "app-pytorch" |
| 267 | +``` |
0 commit comments