Skip to content

post: Parquet Content-Defined Chunking #2987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 25, 2025
Merged

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Jul 22, 2025

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

@kszucs kszucs changed the title Post: Parquet Content-Defined Chunking post: Parquet Content-Defined Chunking Jul 22, 2025
@kszucs
Copy link
Member Author

kszucs commented Jul 22, 2025

@pcuenca could you (or redirect me to someone who could) help me generating a thumbnail?

cc @lhoestq

@merveenoyan
Copy link
Contributor

hey @kszucs you can use Canva to create one!

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot! tbh looks good to me but I'd like one more approval

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome !

The thumbnail looks great too :) it's a bit heavy though, can you use a tool to reduce its size ? (google "compress png")

@kszucs kszucs force-pushed the parquet-cdc branch 2 times, most recently from 0ade264 to c7cb268 Compare July 23, 2025 17:20
@kszucs
Copy link
Member Author

kszucs commented Jul 23, 2025

I reduced the thumbnail's size and squashed the commits to exclude it from the history.

Copy link
Contributor

@jsulz jsulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Some minor comments, but this is a great writeup ❤️

Co-authored-by: Quentin Lhoest <[email protected]>
@lhoestq lhoestq merged commit 316719d into huggingface:main Jul 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants