Quick Start

full documentation for setting up in different environments is available here: https://github.com/abetlen/llama-cpp-python

MacOS instructions

You'll need xcode installed. Full details are here: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md

Setup venv

python3 -m venv venv
. venv/bin/activate
pip install openai huggingface-hub
pip uninstall llama-cpp-python -y
CMAKE_ARGS="-DLLAMA_METAL=on" pip install -U llama-cpp-python --no-cache-dir
pip install 'llama-cpp-python[server]'

Download model

download a model from https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF

mkdir models/
huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q5_K_M.gguf --local-dir models/ --local-dir-use-symlinks False

Run server

In this sample we will use a client/server approach, but it's possible to skip the server entirely.

export MODEL=models/mistral-7b-instruct-v0.1.Q5_K_M.gguf
python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 1

This server implements the OpenAI API. You can see standard API docs for completions here: https://platform.openai.com/docs/api-reference/completions

There is also a reference on the server: http://localhost:8000/docs

Run client

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quick Start

MacOS instructions

Setup venv

Download model

Run server

Run client

About

Uh oh!

Releases

Packages

Languages

jonabur/generation_quickstart

Folders and files

Latest commit

History

Repository files navigation

Quick Start

MacOS instructions

Setup venv

Download model

Run server

Run client

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages