Skip to content

[Dataset] Integrating GitTables and Tabular Methods#58

Open
luizfacury wants to merge 7 commits intogalilai-group:mainfrom
HumzahM:gittables
Open

[Dataset] Integrating GitTables and Tabular Methods#58
luizfacury wants to merge 7 commits intogalilai-group:mainfrom
HumzahM:gittables

Conversation

@luizfacury
Copy link
Copy Markdown
Contributor

This pull request introduces foundational support for tabular datasets in the codebase, with a focus on the new GitTables dataset. It adds a comprehensive documentation page for GitTables, implements base classes for tabular data handling, and integrates the new tabular module throughout the package. The most important changes are outlined below.

Tabular Dataset Infrastructure

  • Added TabularDataset, TabularTaskInfo, and TabularBaseDatasetBuilder classes to stable_datasets/tabular/base.py, providing a standardized way to represent, access, and split tabular datasets using Arrow tables and pandas DataFrames. These classes also support cross-validation splits and metadata handling.
  • Created the stable_datasets/tabular module and exposed TabularDataset, TabularTaskInfo, TabularBaseDatasetBuilder, and GitTables in its __init__.py.
  • Registered the new tabular module in the top-level stable_datasets/__init__.py and updated the __all__ list to include it.

GitTables Dataset Integration

  • Added a detailed documentation page for GitTables at docs/source/datasets/gittables.rst, describing its structure, metadata, usage examples, cache layout, and references.
  • Linked the new GitTables documentation in the datasets index under a new "Tabular Datasets" section.

@HumzahM
Copy link
Copy Markdown

HumzahM commented Apr 9, 2026

Don't approve this yet if you see this - still working on things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants