Skip to content

Conversation

@hiroyukinakazato-db
Copy link

@hiroyukinakazato-db hiroyukinakazato-db commented Sep 30, 2025

Changes

This PR adds Switch, an LLM-powered transpiler, as an optional component in Lakebridge's install-transpile workflow. Switch installation is controlled by the new --include-llm-transpiler flag.

What does this PR do?

Implements complete Switch transpiler integration with idempotent installation, workspace deployment, job management, and resource configuration including Unity Catalog Volume for Switch.

Relevant implementation details

CLI Integration:

  • Add --include-llm-transpiler flag to install-transpile command (default: false)
  • Switch installation is opt-in, allowing users to choose between LSP-only or LLM-powered transpilation
  • Bladebridge and Morpheus continue to install by default

SwitchInstaller Implementation:

  • Extends TranspilerInstaller with unified constructor signature
  • WheelInstaller pattern for PyPI package management (databricks-switch-plugin)
  • Workspace deployment: uploads Switch package from site-packages to /Users/{user}/.lakebridge/switch/
  • Job creation: creates LAKEBRIDGE_Switch job with NotebookTask for parallel LLM processing
  • Resource configuration: prompts for catalog, schema, and volume for Switch
  • Idempotent behavior: supports reinstallation after uninstall with full recovery

Uninstall Integration:

  • Removes Switch job from workspace via InstallState
  • Logs manual cleanup instructions for validation schema and Switch resources (catalog, schema, volume)
  • Integrated into databricks labs uninstall lakebridge workflow

Caveats/things to watch out for when reviewing:

  • Opt-in by default: Switch is NOT installed by default. Users must explicitly specify --include-llm-transpiler flag to install Switch
  • User agent: Added include-llm-transpiler, "true" to user agent
  • Pylint configuration: Updated max-args from 12 to 13 in pyproject.toml to accommodate new CLI parameter
  • Resource lifecycle: Catalog, schema, and volume for Switch are prompted during install but require manual cleanup after uninstall
  • Job management: Switch job is tracked in InstallState and managed independently from Reconciliation jobs

Linked issues

Resolves #2048

Console Output

Select the source dialect:
[0] Set it later
......
Select the transpiler:
[0] Set it later
[1] Bladebridge
[2] Morpheus
[3] Switch
Enter a number between 0 and 3: 3
Enter input SQL path (directory/file) (default: Set it later):
Enter output directory (default: transpiled):
Enter error file path (default: errors.log):
17:01:57     INFO [d.l.lakebridge.install] Note: Switch transpiler is LLM Transpiler has a different execution process
17:01:57     INFO [d.l.lakebridge.install] Starting the additional configuration required for Switch...
17:01:57     INFO [d.l.lakebridge.install] Please provide the **Mandatory** following resources to set up Switch:
Enter catalog name (default: lakebridge): lakebridge_ss
17:02:01     INFO [d.l.l.deployment.configurator] Found existing catalog `lakebridge_ss`
Enter schema name (default: switch): switch_ss
Schema `switch_ss` doesn't exist in catalog `lakebridge_ss`. Create it? (default: no): yes
17:02:07     INFO [d.l.l.helpers.metastore] Created schema `switch_ss` in catalog `lakebridge_ss`.
Enter volume name (default: switch_volume):
Volume `switch_volume` doesn't exist in catalog `lakebridge_ss` and schema `switch_ss`. Create it? (default: no): yes
17:02:24     INFO [d.l.l.helpers.metastore] Created volume `switch_volume` in catalog `lakebridge_ss` and schema `switch_ss`
17:02:24     INFO [d.l.lakebridge.install] Now select the foundational model to use with Switch for LLM Transpile
Select a Foundation Model serving endpoint:
[0] [Default] databricks-claude-sonnet-4-5
[1] databricks-----------
[2] databricks-----------
[3] databricks-claude----
[4] databricks-claude----
[5] databricks-----------
[6] databricks-----------
[7] databricks-----------
[8] databricks-----------
[9] databricks-----------
[10] databricks----------
[11] databricks----------
[12] databricks----------
[13] databricks----------
[14] databricks-----------
Enter a number between 0 and 14: 0
17:02:38     INFO [d.l.lakebridge.install] Saving configuration file config.yml
Open config file https://<>/#workspace/Users/<>/.lakebridge/config.yml in the browser? (default: no): yes
17:02:45     INFO [d.l.lakebridge.install] Finished configuring lakebridge `transpile`.
17:02:45  WARNING [d.l.lakebridge.install] feature/switch-installer-integration is not a valid version.
17:02:56     INFO [d.l.lakebridge.install] Installing Switch transpiler to workspace.
17:02:56     INFO [d.l.l.deployment.switch] Copying resources to /Users/<>/.lakebridge/switch in workspace.......
17:03:48     INFO [d.l.l.deployment.switch] Completed Copying resources to /Users/<>/.lakebridge/switch in workspace...
17:03:49     INFO [d.l.l.deployment.switch] Setting up Switch job in workspace...
17:03:49     INFO [d.l.l.deployment.switch] Creating new Switch job
17:03:53     INFO [d.l.l.deployment.switch] Switch job created/updated: https://<>/jobs/{job_id}
																																							17:03:53     INFO [d.l.lakebridge.install] Installation completed successfully! Please refer to the documentation for the next steps.

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge install-transpile (adds --include-llm-transpiler flag and Switch installation support)
  • modified existing command: databricks labs uninstall lakebridge (adds Switch job and resource cleanup)

Tests

  • manually tested
  • added unit tests
  • added integration tests

- Add Switch installer with resource configuration and job creation
- Implement uninstall functionality with proper cleanup
- Add comprehensive test coverage for SwitchInstaller
- Improve path handling and type-safe configuration
- Add include-llm-transpiler option for flexible installation
@hiroyukinakazato-db hiroyukinakazato-db added enhancement New feature or request feat/cli actions that are visible to the user labels Sep 30, 2025
@github-actions
Copy link

github-actions bot commented Sep 30, 2025

✅ 46/46 passed, 7 flaky, 3m8s total

Flaky tests:

  • 🤪 test_validate_non_empty_tables (50ms)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (14.265s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (15.715s)
  • 🤪 test_transpile_teradata_sql (19.69s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (5.225s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (8.892s)
  • 🤪 test_transpiles_informatica_to_sparksql (9.581s)

Running from acceptance #2730

Implement SwitchInstaller to integrate Switch transpiler with Lakebridge:
- Install Switch package to local virtual environment and deploy to workspace
- Create and manage Databricks job for Switch transpilation
- Configure Switch resources (catalog, schema, volume) interactively
- Support job-level parameters with JobParameterDefinition for flexibility
- Handle installation state and job lifecycle management
- Add comprehensive test suite covering installation, job management, and configuration
The SwitchInstaller was failing to find the config when the config.yml
used "Switch" (capitalized) as the name, while the code only checked
for "switch" (lowercase). This caused job creation to fail with a
"config.yml not found" error.

Updated _get_switch_job_parameters() to check both the display name
(capitalized) and transpiler ID (lowercase) to handle both cases.
@hiroyukinakazato-db hiroyukinakazato-db marked this pull request as ready for review October 8, 2025 02:14
@hiroyukinakazato-db hiroyukinakazato-db requested a review from a team as a code owner October 8, 2025 02:14
wheel_name = self._PYPI_PACKAGE_NAME.replace("-", "_")
return wheel_name in artifact.name and artifact.suffix == ".whl"

def install(self, artifact: Path | None = None) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we have separated things out: the installer installs things locally, deployer deploys to the workspace. You need something similar to recon deployer that way you don't need to have workspace client inside TranspilerInstaller

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Updated in 8439314 - created SwitchDeployment for workspace operations (following ReconDeployment pattern) and removed workspace dependencies from TranspilerInstaller.

for transpiler_installer in self._transpiler_installers:
transpiler_installer.install()
if not config:
config = self.configure(module)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config = self.configure(module, include-llm)

and use configure to implement switch-related prompts and have a deployer similar to recon deployment for doing workspace-related interaction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated in 8439314 to use the configure(module, include_llm) pattern. Switch resources are now prompted during configuration, stored in TranspileConfig, and passed to SwitchDeployment for workspace setup.

Separates Switch transpiler's local installation logic from workspace
deployment, following established patterns (BladebridgeInstaller for
local installation, ReconDeployment for workspace deployment).

Key changes:
- Add SwitchDeployment class (~260 lines) for workspace operations
- Simplify SwitchInstaller to match BladebridgeInstaller pattern (~20 lines)
- Add include_llm and switch_resources fields to TranspileConfig
- Update WorkspaceInstallation to use SwitchDeployment
- Refactor tests to avoid protected member access using fixture separation
- Group Switch-related tests in TestSwitchInstallation class
@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

❌ Patch coverage is 44.87805% with 113 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.29%. Comparing base (3d54bc0) to head (42c9c4e).

Files with missing lines Patch % Lines
...rc/databricks/labs/lakebridge/deployment/switch.py 40.00% 71 Missing and 1 partial ⚠️
src/databricks/labs/lakebridge/install.py 47.05% 20 Missing and 7 partials ⚠️
...abricks/labs/lakebridge/deployment/configurator.py 31.25% 11 Missing ⚠️
src/databricks/labs/lakebridge/cli.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2066      +/-   ##
==========================================
- Coverage   64.78%   64.29%   -0.50%     
==========================================
  Files          96       97       +1     
  Lines        7891     8068     +177     
  Branches      820      838      +18     
==========================================
+ Hits         5112     5187      +75     
- Misses       2599     2699     +100     
- Partials      180      182       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


return {
"name": job_name,
"tags": {"created_by": self._ws.current_user.me().id, "switch_version": f"v{switch_version}"},
Copy link
Author

@hiroyukinakazato-db hiroyukinakazato-db Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundarshankar89 By using .id, the created_by tag value will be the user's ID (e.g. 5099015744649857) rather than the user's email address. While this is clearer from a system perspective, it may be less intuitive for users themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feat/cli actions that are visible to the user

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants