Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 107 additions & 4 deletions docs/lakebridge/docs/assessment/profiler/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,114 @@
sidebar_position: 1
title: Profiler Guide
---
import Admonition from '@theme/Admonition';

# Profiler
# Profiler Guide

import Admonition from '@theme/Admonition';
## Overview

<p>
The **Lakebridge Profiler** is designed to extract and analyze metadata from database systems, providing insights into your source environment.
The profiler helps you understand system configurations, resource utilization, query patterns, and performance metrics to aid in migration planning.
</p>

<Admonition type="info" title="Coming Soon">
Profiler is coming soon! Stay tuned for updates.
Key capabilities:
- **Database Metadata Extraction**: Captures schema information, table structures, and object definitions
- **Performance Analytics**: Collects query execution metrics and resource utilization data
- **Workload Analysis**: Profiles active queries and identifies optimization opportunities

<Admonition type="info" title="Prerequisites">
For detailed source system specific prerequisites, refer to <a href="./Prerequisites" style={{ fontWeight: 'bold', color: '#1976d2', textDecoration: 'underline' }}>Pre-Req</a> section.
</Admonition>

## Configure Profiler

Before running the profiler, you need to configure the connection details for your source system.

Execute the following command to configure the profiler:

```bash
databricks labs lakebridge configure-database-profiler
```

This will prompt you to select the source system and provide connection details:

```console
Please select the source system you want to configure
[0] mssql
[1] synapse
Enter a number between 0 and 1: 1
(local | env)
local means values are read as plain text
env means values are read from environment variables fall back to plain text if not variable is not found

Enter secret vault type (local | env)
[0] env
[1] local
Enter a number between 0 and 1: 1
Please provide Synapse Workspace settings:
Enter Synapse workspace name: synapse
Enter SQL user: user
Enter SQL password:
Enter timezone (e.g. America/New_York) (default: UTC):
Enter the ODBC driver installed locally (default: ODBC Driver 18 for SQL Server):
Please provide Azure access settings:
Enter development endpoint: synapse.endpoint
Please select JDBC authentication type:
Select authentication type
[0] ad_passwd_authentication
[1] spn_authentication
[2] sql_authentication
Enter a number between 0 and 2: 2
Enter fetch size (default: 1000):
Enter login timeout (seconds) (default: 30):
Exclude serverless SQL pool from profiling? (default: no):
Exclude dedicated SQL pools from profiling? (default: no):
Exclude Spark pools from profiling? (default: no):
Exclude monitoring metrics from profiling? (default: no):
Redact SQL pools SQL text? (default: no):
```

## Execute Profiler

Once configured, run the profiler to extract metadata and performance metrics from your source system:

```bash
databricks labs lakebridge execute-database-profiler --help
```

output:

```console
Profile the source system database

Usage:
databricks labs lakebridge execute-database-profiler [flags]

Flags:
-h, --help help for execute-database-profiler
--source-tech string (Optional) The technology/platform of the sources to Profile

Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
```

The profiler will:
1. Connect to your source system using the configured credentials
2. Execute the profiling pipeline to extract metadata and metrics
3. Store the results in the configured output location
4. Generate a summary report of the profiling execution

:::tip
The profiler can be run multiple times to capture different time periods or updated configurations.
Each execution will create a timestamped snapshot of your source environment.
:::

## Supported Source Systems

| Source Platform | Configuration Status |
|:---------------:|:-------------------:|
| Azure Synapse | &#x2705; |
99 changes: 99 additions & 0 deletions docs/lakebridge/docs/assessment/profiler/prerequisites.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
sidebar_position: 1
title: Profiler Prerequisites Guide
---
import Admonition from '@theme/Admonition';

# Profiler Prerequisites Guide

- [Azure Synapse](#azure-synapse)
- [Required Access to Synapse Workspace](#required-access-to-synapse-workspace)
- [Setup User id/password for ODBC connectivity](#setup-user-idpassword-for-odbc-connectivity)
- [Microsoft SQL Server \(MSSQL\)](#microsoft-sql-server-mssql)
- [Oracle](#oracle)

## Azure Synapse

- Python 3.10+
- Databricks CLI [Download](https://docs.databricks.com/aws/en/dev-tools/cli/install)
- Azure CLI [Download](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
- ODBC driver for SQL Server [Download](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver17)
- (Windows only) Visual C++ [Download](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170)

Authenticate to Azure using Azure CLI:
```bash
az login
```
### Required Access to Synapse Workspace

:::warning Attention:
Skip this prerequisite if you are using a standalone SQL Dedicated Pool (formerly Azure SQL DW) and NOT a Synapse Workspace
:::

- Profiler uses the Python version of Azure SDK libraries to extract information about target Synapse Workspace.
- For making the Azure API calls using Azure SDK you need an Azure Service Principal with the following role assignments.
Just giving the Synapse Administrator role is not enough. The below roles must be explicitly assigned.
- Synapse Artifact User.
- Assign from Synapse Workspace → Manage Access → Access Control [Refer to Azure documentation](https://learn.microsoft.com/en-us/azure/synapse-analytics/security/how-to-manage-synapse-rbac-role-assignments)
- Monitoring Reader
- Assign from Synapse Workspace → IAM

### Setup User id/password for ODBC connectivity
Create a user with SQL authentication mechanism to support querying below-mentioned tables in Synapse.

:::warning Attention:
The user should not have Multi-factor Authentication (MFA) enabled as ODBC does not support MFA
:::

This user id need to have read access (SELECT grants) on the following tables and VIEW DATABASE STATE and VIEW DEFINITION
grants are required for Dynamic Management Views (DMVs)

```sql
GRANT VIEW DATABASE STATE TO <user_id>
GRANT VIEW DEFINITION TO <user_id>
```


- Dedicated SQL pool - Tables
- sys.databases
- information_schema.tables
- information_schema.columns
- information_schema.views
- information_schema.routines

- Dedicated SQL pool - DMVs
- sys.dm_pdw_exec_sessions
- sys.dm_pdw_exec_requests
- sys.dm_pdw_nodes_db_partition_stats
- sys.dm_pdw_nodes_exec_query_stats

- Serverless SQL Pool - Tables
- sys.databases
- information_schema.tables
- information_schema.columns
- information_schema.views
- information_schema.routines

- Serverless SQL pool - DMVs
- sys.dm_exec_sessions
- sys.dm_exec_requests
- sys.dm_exec_query_stats
- sys.dm_exec_sql_text
- sys.dm_exec_requests_history
- sys.dm_external_data_processed

[Back to Configure Profiler](../#configure-profiler)

## Microsoft SQL Server (MSSQL)
<Admonition type="info" title="Coming Soon">
SQLServer coming soon! Stay tuned for updates.
</Admonition>

[Back to Configure Profiler](../#configure-profiler)

## Oracle
<Admonition type="info" title="Coming Soon">
Oracle coming soon! Stay tuned for updates.
</Admonition>

[Back to Configure Profiler](../#configure-profiler)
Loading