Highlights
- Llama3.1 is now supported -- Test it out by using the
"hf://meta-llama/Llama-3.1-8B"model handle :) - MaxText model inference with KV cache --
MaxTextModel.generate("hi")now is much faster! - Serve with vLLM TPUs or GPUs -- Check our docs to see how to serve Kithara tuned models to vLLM
What's Changed
- Support llama3.1 models by @chandrasekhard2
- Add vllm guide by @richardsliu in #3
- Add disk storage documentation by @richardsliu in #4
- Patch llama3.1 model support by @wenxindongwork in #5
- Fix Llama3.1 MaxText saving by @wenxindongwork in #6
- Support uploading models to HuggingFace Hub by @wenxindongwork in #7
- Adding multiple benchmarks for LoRA & full parameter fine by @manavgarg in #9
- Support running inference with KV cache on Kithara's MaxText models. by @wenxindongwork in #10