Skip to content

Add ZINC molecular graph dataset#250

Open
Uneeb808 wants to merge 1 commit intoJuliaML:masterfrom
Uneeb808:add-zinc-dataset
Open

Add ZINC molecular graph dataset#250
Uneeb808 wants to merge 1 commit intoJuliaML:masterfrom
Uneeb808:add-zinc-dataset

Conversation

@Uneeb808
Copy link
Contributor

This PR adds the ZINC molecular graph dataset.

~250,000 molecular graphs with a penalized logP regression target (y = logP - SAS - cycles). Includes the standard 12k benchmark subset from "Benchmarking Graph Neural Networks" (Dwivedi et al. 2020).

Features:

  • Full (~250k graphs) and subset (12k) variants
  • train/val/test splits
  • Node features: atom type (28 classes)
  • Edge features: bond type (4 classes)
  • Graph-level regression target (Float32)

Note: Raw data is distributed as PyTorch pickle files which Pickle.jl cannot deserialize directly. I preprocessed these to flat NPZ files and hosted them on my fork's releases. Happy to move to JuliaML releases on merge.

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 80.95238% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.75%. Comparing base (3d40a68) to head (ae14f0e).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
src/datasets/graphs/zinc.jl 80.64% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #250      +/-   ##
==========================================
+ Coverage   45.94%   47.75%   +1.80%     
==========================================
  Files          50       54       +4     
  Lines        2501     2689     +188     
==========================================
+ Hits         1149     1284     +135     
- Misses       1352     1405      +53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Uneeb808
Copy link
Contributor Author

This PR addresses #239

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants