-
Notifications
You must be signed in to change notification settings - Fork 211
Open
Description
You can use the 70b parameter model now as well, here is how I accomplished it:
-
Downloaded the 70b parameter model I wanted from https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGML/tree/main. In my case, I chose 'llama-2-70b-chat.ggmlv3.q5_K_M.bin'. None of my runs so far have used much more than 6-8GB of RAM. You need to modify the 'config/config.yml' to point to your newly downloaded model.
-
Updated the CTransformers package to the latest version which adds support for 70b (ctransformers-0.2.15 or higher):
poetry run pip install ctransformers --upgrade
-
I also updated langchain (and I had done this first but I'm not sure it's required):
poetry run pip install langchain --upgrade
Now it runs! Much slower (<1 minute became almost 10min).
Metadata
Metadata
Assignees
Labels
No labels