Skip to content

Conversation

@esadek
Copy link
Contributor

@esadek esadek commented Nov 7, 2025

Add a guide for installing and using dbc in Python notebooks.

The included code has been tested in both a local notebook and Google Colab.

Closes #116

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @esadek! This looks pretty good. I left some comments.

Print the table:

```python
print(table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the output? It may be helpful if users aren't that familiar with PyArrow or the dataset.


# Python Notebooks

dbc can be installed and used directly in Python notebooks (such as Jupyter or Google Colab).
Copy link
Member

@eitsupi eitsupi Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering something like marimo, I wonder if it's appropriate to use the term "Python Notebook" to refer only to Jupyter / ipynb.
(IIUC, Colab is also Jupyter.)

So, I think this page should actually be called Jupyter instead of Python Notebook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better: @esadek, could you could test dbc in a Marimo notebook and include instructions for how to use it there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On marimo, we should use something like import subprocess; subprocess.run(["dbc", "install", "duckdb"])
https://docs.marimo.io/guides/coming_from/jupyter/#magic-commands

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when reviewing this PR I looked at and tested marimo and I don't know if it makes sense to do more than add a note about it. I doubt we want to tell users to run dbc like the above since it's not very convenient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think a note like this would be sufficient:

If you're using a Python notebook that doesn't support magics (%) or shell escapes (!) then use subprocess.run to run dbc commands from your Python code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing marimo, I've concluded that a dedicated guide or section would be best.

In marimo, packages are typically managed via the terminal (external or internal), inline script metadata, or the integrated "Manage packages" panel. This differs from Jupyter or Google Colab, where !pip install is common.

Drivers can be installed with dbc via the terminal (external or internal) or in a cell using subprocess.run.

Locally, dbapi.connect(driver="duckdb") works as expected. However, on molab (marimo’s cloud platform), adbc_driver_manager throws an error. Instead, the full driver path must be used: dbapi.connect(driver="/tmp/uv-venv/etc/adbc/drivers/duckdb").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @esadek. Could you open a separate issue for us to look into the molab problem? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document how to use dbc in Python notebooks

4 participants