-
Notifications
You must be signed in to change notification settings - Fork 91
Data Accelerator with Databricks
Data Accelerator environment can now be set up to run jobs on either Databricks or HDInsight. During the time of setting up the Data Accelerator environment, you can choose the platform on which you would want to run the spark jobs – Databricks or HDInsight.
In this tutorial we will go over:
- How to setup Data Accelerator environment that uses Databricks
- How to run Data Accelerator flows on Databricks
- Install Azure CLI from here
- Install Databricks CLI from here
- Download the scripts and templates locally via this link: template
- Open common.parameters.txt under DeploymentCloud/Deployment.DataX, provide TenantId and SubscriptionId. Also set useDatabricks = y
- For Windows OS, open a command prompt as an admin under the downloaded folder DeploymentCloud/Deployment.DataX and run :
deploy.bat
- If you are not the admin of the tenant (typically when using AAD account), then please copy over the DeploymentCloud folder to your admin's machine and ask your admin to run the following command:
runAdminSteps.bat
The above steps will setup the azure resources required by Data Accelerator. A Databricks resource will also be created. To finish setting up databricks resource you will further need to generate databricks token, create a secret scope, upload jars to DBFS which are required to run spark jobs and finally create a databricks cluster for live query.
The following steps will instruct you through the steps required to create a Databricks token. This databricks token will be required to run Databricks CLI commands which we will go over later in the setup process and for running flows on databricks.
- On https://portal.azure.com, go to the ‘Azure Databricks Service’ resource created by the ARM deployment step and click on ‘Launch Workspace’.