Skip to content

Conversation

@LangFeng0912
Copy link

Overall: Scripts for processing large datasets have been added.
The adding and updating parts include:
in the "main.py": add new CLI command: "learn_split", "gen_cluster", "infer_projects"
in the "vectorize.py": update the generating datapoints function in batches
add "learn_split.py" for training the model separately
add "gen_cluster.py" for generating clusters based on the model separately
also add new functions for dataset_loading in "data_loadeds.py"

@mir-am mir-am added the enhancement New feature or request label Mar 29, 2023
Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
Also, create a new module called predict_split to load type clusters generated from gen_cluster. The output should be a JSON file that can be processed by eval.

Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new changes. Please address my new comments.

Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the minor comments in the review. Thanks!

@mir-am mir-am changed the base branch from main to dev June 6, 2023 14:30
Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants