Skip to content

Conversation

flamingofugang
Copy link

Issue #, if available:

Description of changes:
For this requested start guide: #10

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

export TOKENIZED_DATA_PATH=/fsx/tokenized_data
export DATASET_NAME=wikicorpus
export dATASET_CONFIG_NAME=raw_en
export HF_MODEL_NAME=meta-llama/Meta-Llama-3-8B # change this to meta-llama/Meta-Llama-3-8B if you want to train llama3 8B model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to the NousResearch/Meta-Llama-3-8B-Instruct for the workshop so users do not need to provide the huggingface token ID and get approval from Meta?


cat tokenize_data.yaml-template | envsubst > tokenize_data.yaml

cat llama3_train.yaml-template | envsubst > llama3_train.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that we need to create the compilation script too. Can you double check this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants