Skip to content
This repository was archived by the owner on Aug 2, 2024. It is now read-only.

Commit c378e97

Browse files
garg-amitjfomhoverthomasp-msthomasp-msmajercakdavid
authored
Release 06: Accelerator has features like native FL SDK contract, benchmark results... (#282)
* refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Update release branch (#271) * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <[email protected]> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <[email protected]> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <[email protected]> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <[email protected]> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <[email protected]> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <[email protected]> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <[email protected]> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <[email protected]> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <[email protected]> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <[email protected]> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <[email protected]> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <[email protected]> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <[email protected]> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <[email protected]> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <[email protected]> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <[email protected]> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <[email protected]> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <[email protected]> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <[email protected]> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <[email protected]> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <[email protected]> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <[email protected]> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <[email protected]> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <[email protected]> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: unknown <[email protected]> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <[email protected]> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <[email protected]> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <[email protected]> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <[email protected]> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <[email protected]> * hot fix (#192) Co-authored-by: thomasp-ms <[email protected]> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <[email protected]> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <[email protected]> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <[email protected]> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <[email protected]> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <[email protected]> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <[email protected]> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <[email protected]> * November notes (#211) Co-authored-by: thomasp-ms <[email protected]> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <[email protected]> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <[email protected]> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <[email protected]> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <[email protected]> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <[email protected]> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <[email protected]> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "[email protected]" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <[email protected]> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <[email protected]> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Jeff Omhover <[email protected]> Co-authored-by: Thomas <[email protected]> Co-authored-by: thomasp-ms <[email protected]> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-aut…
1 parent e423f1b commit c378e97

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1575
-1481
lines changed

.github/actions/submit-aml-factory-pipeline/action.yaml renamed to .github/actions/submit-aml-scatter-gather-pipeline/action.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
name: Submit example factory pipeline
2-
description: Submit example factory pipeline in AML
1+
name: Submit example scatter-gather pipeline
2+
description: Submit example scatter-gather pipeline in AML
33
inputs:
44
client-id:
55
description: Client ID of the service principal
@@ -46,6 +46,6 @@ runs:
4646
shell: bash
4747
run: pip install -r examples/pipelines/requirements.txt
4848

49-
- name: Submit fl_cross_silo_factory pipeline
49+
- name: Submit fl_cross_silo_scatter_gather pipeline
5050
shell: bash
51-
run: python examples/pipelines/fl_cross_silo_factory/submit.py --subscription_id ${{ inputs.subscription-id }} --resource_group ${{ inputs.resource-group }} --workspace_name ${{ inputs.workspace-name }} --example ${{ inputs.example }} --ignore_validation --wait || [ $? == 5 ]
51+
run: python examples/pipelines/fl_cross_silo_scatter_gather/submit.py --subscription_id ${{ inputs.subscription-id }} --resource_group ${{ inputs.resource-group }} --workspace_name ${{ inputs.workspace-name }} --example ${{ inputs.example }} --ignore_validation --wait || [ $? == 5 ]

.github/workflows/pipeline-e2e-test.yaml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ jobs:
118118
workspace-name: aml-fl1${{ github.run_attempt }}${{ github.run_id }}
119119
example: MNIST
120120

121-
open-sandbox-factory-helloworld-test:
121+
open-sandbox-scatter-gather-helloworld-test:
122122
needs: open-sandbox-test
123123
runs-on: ubuntu-latest
124124
permissions:
@@ -127,8 +127,8 @@ jobs:
127127

128128
- uses: actions/checkout@v2
129129

130-
- name: Submit Helloworld example using the factory code
131-
uses: ./.github/actions/submit-aml-factory-pipeline
130+
- name: Submit Helloworld example using the scatter-gather code
131+
uses: ./.github/actions/submit-aml-scatter-gather-pipeline
132132
with:
133133
client-id: ${{ secrets.AZURE_CLIENT_ID }}
134134
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
@@ -137,16 +137,16 @@ jobs:
137137
workspace-name: aml-fl1${{ github.run_attempt }}${{ github.run_id }}
138138
example: HELLOWORLD
139139

140-
open-sandbox-factory-mnist-test:
141-
needs: open-sandbox-factory-helloworld-test
140+
open-sandbox-scatter-gather-mnist-test:
141+
needs: open-sandbox-scatter-gather-helloworld-test
142142
runs-on: ubuntu-latest
143143
permissions:
144144
id-token: write
145145
steps:
146146
- uses: actions/checkout@v2
147147

148-
- name: Submit MNIST example using the factory code
149-
uses: ./.github/actions/submit-aml-factory-pipeline
148+
- name: Submit MNIST example using the scatter-gather code
149+
uses: ./.github/actions/submit-aml-scatter-gather-pipeline
150150
with:
151151
client-id: ${{ secrets.AZURE_CLIENT_ID }}
152152
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
@@ -297,16 +297,16 @@ jobs:
297297
workspace-name: aml-fl2${{ github.run_attempt }}${{ github.run_id }}
298298
example: MNIST
299299

300-
vnet-sandbox-factory-helloworld-test:
300+
vnet-sandbox-scatter-gather-helloworld-test:
301301
needs: vnet-sandbox-test
302302
runs-on: ubuntu-latest
303303
permissions:
304304
id-token: write
305305
steps:
306306
- uses: actions/checkout@v2
307307

308-
- name: Submit Helloworld example using the factory code
309-
uses: ./.github/actions/submit-aml-factory-pipeline
308+
- name: Submit Helloworld example using the scatter-gather code
309+
uses: ./.github/actions/submit-aml-scatter-gather-pipeline
310310
with:
311311
client-id: ${{ secrets.AZURE_CLIENT_ID }}
312312
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
@@ -315,16 +315,16 @@ jobs:
315315
workspace-name: aml-fl2${{ github.run_attempt }}${{ github.run_id }}
316316
example: HELLOWORLD
317317

318-
vnet-sandbox-factory-mnist-test:
319-
needs: vnet-sandbox-factory-helloworld-test
318+
vnet-sandbox-scatter-gather-mnist-test:
319+
needs: vnet-sandbox-scatter-gather-helloworld-test
320320
runs-on: ubuntu-latest
321321
permissions:
322322
id-token: write
323323
steps:
324324
- uses: actions/checkout@v2
325325

326-
- name: Submit MNIST example using the factory code
327-
uses: ./.github/actions/submit-aml-factory-pipeline
326+
- name: Submit MNIST example using the scatter-gather code
327+
uses: ./.github/actions/submit-aml-scatter-gather-pipeline
328328
with:
329329
client-id: ${{ secrets.AZURE_CLIENT_ID }}
330330
tenant-id: ${{ secrets.AZURE_TENANT_ID }}

.github/workflows/release-branch-test.yaml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
components-mnist: ${{ steps.filter.outputs.components-mnist }}
2424
components-utils: ${{ steps.filter.outputs.components-utils }}
2525
literal: ${{ steps.filter.outputs.literal }}
26-
factory: ${{ steps.filter.outputs.factory }}
26+
scatter-gather: ${{ steps.filter.outputs.scatter-gather }}
2727
components-pneumonia: ${{ steps.filter.outputs.components-pneumonia }}
2828
components-ner: ${{ steps.filter.outputs.components-ner }}
2929
components-ccfraud: ${{ steps.filter.outputs.components-ccfraud }}
@@ -56,8 +56,8 @@ jobs:
5656
- 'examples/components/utils/**'
5757
literal:
5858
- 'examples/pipelines/fl_cross_silo_literal/**'
59-
factory:
60-
- 'examples/pipelines/fl_cross_silo_factory/**'
59+
scatter-gather:
60+
- 'examples/pipelines/fl_cross_silo_scatter_gather/**'
6161
components-pneumonia:
6262
- 'examples/components/PNEUMONIA/**'
6363
components-ner:
@@ -149,18 +149,18 @@ jobs:
149149
workspace-name: ${{ secrets.AML_WORKSPACE_NAME }}
150150
example: HELLOWORLD
151151

152-
factory-helloworld-test:
152+
scatter-gather-helloworld-test:
153153
runs-on: ubuntu-latest
154154
needs: paths-filter
155-
if: ${{ (needs.paths-filter.outputs.components-helloworld == 'true') || (needs.paths-filter.outputs.factory == 'true') }}
155+
if: ${{ (needs.paths-filter.outputs.components-helloworld == 'true') || (needs.paths-filter.outputs.scatter-gather == 'true') }}
156156
permissions:
157157
id-token: write
158158
steps:
159159

160160
- uses: actions/checkout@v2
161161

162-
- name: Submit Helloworld example using the factory code
163-
uses: ./.github/actions/submit-aml-factory-pipeline
162+
- name: Submit Helloworld example using the scatter-gather code
163+
uses: ./.github/actions/submit-aml-scatter-gather-pipeline
164164
with:
165165
client-id: ${{ secrets.AZURE_CLIENT_ID }}
166166
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
@@ -188,7 +188,7 @@ jobs:
188188
workspace-name: ${{ secrets.AML_WORKSPACE_NAME }}
189189
example: MNIST
190190

191-
factory-mnist-test:
191+
scatter-gather-mnist-test:
192192
runs-on: ubuntu-latest
193193
needs: paths-filter
194194
if: ${{ needs.paths-filter.outputs.components-mnist == 'true' }}
@@ -198,8 +198,8 @@ jobs:
198198

199199
- uses: actions/checkout@v2
200200

201-
- name: Submit MNIST example using the factory code
202-
uses: ./.github/actions/submit-aml-factory-pipeline
201+
- name: Submit MNIST example using the scatter-gather code
202+
uses: ./.github/actions/submit-aml-scatter-gather-pipeline
203203
with:
204204
client-id: ${{ secrets.AZURE_CLIENT_ID }}
205205
tenant-id: ${{ secrets.AZURE_TENANT_ID }}

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,6 @@ config.json
133133

134134
# for ignoring test jobs
135135
/examples/pipelines/test*
136+
137+
# for ignoring local sandbox files for debugging/testing
138+
/sandbox/*

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,29 @@
11
# FL Accelerator Changelog
22

3+
## February 2023 release
4+
5+
We are excited to announce the release of the February iteration of our [FL Accelerator repository](https://github.com/Azure-Samples/azure-ml-federated-learning).
6+
7+
Some of the major updates we have made include the launch of a vertical federated learning feature, an FL pipeline that offers a native AML FL experience integrated with the factory engine, and benchmark results that reveal a comprehensive comparison between FL and non-FL experiments.
8+
9+
### FL Experience
10+
- Implemented _Vertical Federated Learning_ and offered a [tutorial](./docs/tutorials/vertical-fl.md) to run MNIST or CCFRAUD examples.
11+
- Introduced a [scatter-gather](./docs/tutorials/literal-scatter-gather-tutorial.md) pipeline that delivers a real AML FL native experience.
12+
- Conducted a comprehensive comparison b/w FL and non-FL experiments and the benchmark report can be accessed [here](./docs/concepts/benchmarking.md).
13+
14+
### Provisioning
15+
- Provided [instructions](./docs/tutorials/update-local-data-to-silo-storage-account.md) and a script to facilitate the upload of local data to a silo storage.
16+
- Incremental improvements:
17+
- Enhanced the network security rules and minimized the workspace dependencies for provisioning resources.
18+
<!-- ### Documentation -->
19+
20+
<!-- ### Repository structure
21+
-->
22+
23+
To get started, go [here](./docs/quickstart.md)!
24+
25+
If you find a bug or have a feature request, please open an issue on the [GitHub repository](https://github.com/Azure-Samples/azure-ml-federated-learning/issues).
26+
327
## January 2023 release
428

529
We are excited to announce the release of the January iteration of our [FL Accelerator repository](https://github.com/Azure-Samples/azure-ml-federated-learning).

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This repo provides some code samples for running a federated learning pipeline i
1212
- [Real world examples](#real-world-examples)
1313
- [FL Frameworks](#fl-frameworks)
1414
- [Documentation](#documentation)
15+
- [Real-world example benchmarks](#real-world-example-benchmarks)
1516
- [Need Support?](#need-support)
1617
- [Contributing](#contributing)
1718

@@ -51,6 +52,15 @@ If you are already using a specific FL framework, you can port your code to work
5152

5253
Please find a full documentation of this project [**here**](docs/README.md).
5354

55+
### Real-world example benchmarks
56+
57+
A benchmarking analysis is performed for each real-world example to understand the validy, efficiency and scalability of our implementation of FL:
58+
59+
| Training overhead | Model performance | Scalability |
60+
|:-:|:-:|:-:|
61+
| [![overhead icon](./docs/pics/pneumonia_time.jpg)](./docs/concepts/benchmarking.md/#21-training-overhead)| [![performance icon](./docs/pics/pneumonia_acc.jpg)](./docs/concepts/benchmarking.md/#22-model-performance)| [![scala icon](./docs/pics/pneumonia_ddp.jpg)](./docs/concepts/benchmarking.md/#23-scalability-with-training)
62+
63+
5464
### Need Support?
5565

5666
Please check the [**troubleshooting guide**](./docs/troubleshoot.md) for possible solutions. If you are unable to find a solution, please open an issue in this repository.

docs/README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,9 @@
2222
- [Tutorials](#tutorials)
2323
- [What this repo has to offer?](#what-this-repo-has-to-offer)
2424
- [Provisioning guide](#provisioning-guide)
25-
- [How to adapt the "literal" and the "factory" code for your own scenario](#how-to-adapt-the-literal-and-the-factory-code-for-your-own-scenario)
25+
- [How to adapt the "literal" and the "scatter-gather" code for your own scenario](#how-to-adapt-the-literal-and-the-scatter-gather-code-for-your-own-scenario)
2626
- [Read local data in an on-premises Kubernetes silo](#read-local-data-in-an-on-premises-kubernetes-silo)
27+
- [Upload local data to silo storage account](#upload-local-data-to-silo-storage-account)
2728
- [Troubleshooting guide](#troubleshooting-guide)
2829

2930
## Motivation
@@ -44,7 +45,7 @@ To know more about the resource provisioning alternatives, please go to the prov
4445

4546
## Real-world examples
4647

47-
In addition to the [literal](../examples/pipelines/fl_cross_silo_literal/) and [factory](../examples/pipelines/fl_cross_silo_factory/) sample experiments, we also provide examples based on real-world applications.
48+
In addition to the [literal](../examples/pipelines/fl_cross_silo_literal/) and [scatter-gather](../examples/pipelines/fl_cross_silo_scatter_gather/) sample experiments, we also provide examples based on real-world applications.
4849

4950
> Note: The `upload-data` scripts are only included in the examples for the convenience of executing the FL examples. Please ignore this section if you are performing an actual FL experiment for your scenario.
5051
@@ -123,9 +124,9 @@ This repo provides some code samples for running a federated learning pipeline i
123124

124125
This guide will help you adapt your own setup depending on your provisioning strategy and your constraints. See [here](./provisioning/README.md) for detailed instructions.
125126

126-
## How to adapt the "literal" and the "factory" code for your own scenario
127+
## How to adapt the "literal" and the "scatter-gather" code for your own scenario
127128

128-
The complete tutorial can be found [**here**](./tutorials/literal-factory-tutorial.md)
129+
The complete tutorial can be found [**here**](./tutorials/literal-scatter-gather-tutorial.md)
129130

130131
## Read local data in an on-premises Kubernetes silo
131132

@@ -134,6 +135,9 @@ This tutorial will show you how to access, within an Azure ML job running on an
134135
## Differential privacy for cross-silo horizontal federated learning
135136
The complete tutorial can be found [**here**](./tutorials/dp-for-cross-silo-horizontal-fl.md).
136137

138+
## Upload local data to silo storage account
139+
This tutorial will teach you how to upload local data to a silo storage account. We will be using a CLI job to do the upload. The job will run on the silo compute, which does have access to the silo storage account. See detailed instructions [here](./tutorials/update-local-data-to-silo-storage-account.md).
140+
137141
# Troubleshooting guide
138142

139143
If you experience an issue using this repository, please check the [**troubleshooting guide**](./troubleshoot.md) for possible solutions. If you are unable to find a solution, please open an issue in this repository.

0 commit comments

Comments
 (0)