saturncloud · GeoSegun · Oct 22, 2025 · Oct 24, 2025
diff --git a/examples/nlp_and_llms/nvidia-lora/README.md b/examples/nlp_and_llms/nvidia-lora/README.md
@@ -0,0 +1,12 @@
+# LoRA Fine-Tuning (PEFT + Transformers)
+
+![LoRA Fine-Tuning Header](https://cdn-icons-png.flaticon.com/512/8101/8101225.png)
+
+This template illustrates how **LoRA fine-tuning** can significantly reduce resource requirements while maintaining strong model performance.
+By running it on **Saturn Cloud**, you benefit from a GPU-optimized, scalable environment that simplifies the entire fine-tuning workflow — from experimentation to production deployment.
+
+Learn more:
+
+* 🔗 [Saturn Cloud Documentation](https://saturncloud.io/docs/)
+* 🔗 [Saturn Cloud Templates Gallery](https://saturncloud.io/resources/templates/)
+* 🔗 [PEFT Library (Hugging Face)](https://huggingface.co/docs/peft/index)
diff --git a/examples/nlp_and_llms/nvidia-lora/nvidia_lora.ipynb b/examples/nlp_and_llms/nvidia-lora/nvidia_lora.ipynb
diff --git a/examples/nlp_and_llms/nvidia-vllm-7b/README.md b/examples/nlp_and_llms/nvidia-vllm-7b/README.md
@@ -0,0 +1,69 @@
+# 🧠 LLM Inference with vLLM 7B
+
+**Saturn Cloud | GPU-Optimised Template**
+
+Run and serve large language models (LLMs) efficiently using **vLLM**, a high-performance inference and serving engine designed for speed and scalability.
+This Saturn Cloud template demonstrates how to deploy **7B-class models** such as *Mistral*, *Llama*, or *Gemma* for text generation and interactive inference.
+
+---
+
+## 🚀 Overview
+
+**vLLM** delivers lightning-fast text generation through techniques such as **PagedAttention**, **continuous batching**, and **quantisation**.
+On **Saturn Cloud**, this notebook enables you to:
+
+* Deploy and test 7B-class LLMs for inference and serving.
+* Scale seamlessly from a single GPU to **multi-GPU clusters**.
+* Experiment interactively or integrate models into larger data-science pipelines.
+
+> ⚙️ Fully compatible with Saturn Cloud’s managed GPU environments and ready for immediate use.
+
+---
+
+## 🧩 Features
+
+* **Pre-configured vLLM environment** for fast setup.
+* **Support for NVIDIA GPUs** (A10G, A100) and multi-GPU scaling.
+* **Quick-start workflow**: load, run, and test model prompts.
+* **Local API-style inference** via vLLM’s serving engine.
+* **Interactive prompt input** for experimentation.
+
+---
+
+## 📋 Requirements
+
+* **Saturn Cloud account** with GPU instance access.
+* Python ≥ 3.12
+* Compatible with **CUDA 12.0+** and **Transformers ≥ 4.40**
+
+All dependencies are pre-installed when running the notebook on Saturn Cloud.
+
+---
+
+## 💡 Usage
+
+1. **Open the template** in Saturn Cloud.
+2. **Select a GPU instance** (A10G or A100 recommended).
+3. **Run the notebook cells sequentially** to:
+
+   * Install dependencies
+   * Configure vLLM settings
+   * Load and test your model
+   * Input prompts interactively to generate text
+
+> For production, vLLM can also serve models as an **OpenAI-compatible API** using the `vllm serve` command.
+
+---
+
+## 🧭 Learn More
+
+* [Saturn Cloud Documentation](https://saturncloud.io/docs/?utm_source=github&utm_medium=template)
+* [Saturn Cloud Templates](https://saturncloud.io/templates/?utm_source=github&utm_medium=template)
+* [vLLM Official Docs](https://docs.vllm.ai/en/latest/?utm_source=saturn&utm_medium=template)
+
+---
+
+## 🏁 Conclusion
+
+This template provides a ready-to-run setup for **LLM inference with vLLM 7B on Saturn Cloud**, combining high performance, scalability, and ease of use.
+Adapt it for experimentation, prototyping, or production-grade LLM deployments in your Saturn Cloud workspace.
diff --git a/examples/nlp_and_llms/nvidia-vllm-7b/nvidia_vllm_7b.ipynb b/examples/nlp_and_llms/nvidia-vllm-7b/nvidia_vllm_7b.ipynb
@@ -0,0 +1,255 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Es_w2TvemoO3"
+      },
+      "source": [
+        "# LLM Inference vLLM 7B\n",
+        "\n",
+        "![chat Bubbles](https://cdn-icons-png.flaticon.com/512/2076/2076246.png) ![GPU Illustration](https://cdn-icons-png.flaticon.com/512/4854/4854226.png)\n",
+        "\n",
+        "**vLLM** is a high-performance inference and serving engine for large language models, optimised for speed and scalability. It delivers efficient text generation through innovations such as **PagedAttention**,** continuous batching**, and support for **quantisation**.\n",
+        "\n",
+        "This is a template demonstrates on how to run **7B-class models** (e.g. Mistral, Llama, Gemma) on Saturn Cloud.\n",
+        "\n",
+        "On [Saturn Cloud](https://saturncloud.io), you can scale from a single NVIDIA GPU to multi-GPU clusters, enabling distributed inference for larger models or higher throughput workloads — all within a managed, GPU-ready environment."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1hhl8dEPmoO5"
+      },
+      "source": [
+        "## 1. Install dependencies\n",
+        "\n",
+        "\n",
+        "We install **vLLM** and **Transformers**. A recent NVIDIA CUDA runtime is recommended for best performance."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "xDTiLAdfmoO6"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q jedi\n",
+        "!pip install -q vllm transformers\n",
+        "!pip install uv\n",
+        "!uv venv vllm-env -p 3.12\n",
+        "!source vllm-env/bin/activate && uv pip install vllm\n",
+        "!source vllm-env/bin/activate && pip install ipykernel\n",
+        "!python -m ipykernel install --user --name=vllm-env --display-name \"vLLM Env\"\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ehqOzc4hmoO8"
+      },
+      "source": [
+        "## 2. Environment check\n",
+        "\n",
+        "Verify the GPU is visible and print library versions. Confirm the environment is GPU-enabled."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_A7AYnJmmoO9"
+      },
+      "outputs": [],
+      "source": [
+        "import torch, platform\n",
+        "import vllm, transformers\n",
+        "\n",
+        "cuda_ok = torch.cuda.is_available()\n",
+        "print(f\"✅ CUDA available: {cuda_ok}\")\n",
+        "if cuda_ok:\n",
+        "    print(\"🧠 GPU:\", torch.cuda.get_device_name(0))\n",
+        "print(\"🧩 torch:\", torch.__version__)\n",
+        "print(\"🧩 vllm:\", vllm.__version__)\n",
+        "print(\"🧩 transformers:\", transformers.__version__)\n",
+        "print(\"🐍 python:\", platform.python_version())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Qpk7TkAhmoO-"
+      },
+      "source": [
+        "## 3. Select model and vLLM settings\n",
+        "\n",
+        "Choose a **7B** model from Hugging Face. The defaults below work with common, openly available options. If a model is gated, select a different one."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Vujk0jtwmoO-"
+      },
+      "outputs": [],
+      "source": [
+        "# 🔧 Model & runtime config (edit these as needed)\n",
+        "MODEL_ID = \"mistralai/Mistral-7B-Instruct-v0.2\"  # e.g., \"meta-llama/Llama-2-7b-chat-hf\", \"google/gemma-7b\"\n",
+        "DTYPE = \"auto\"                 # \"auto\", \"float16\", \"bfloat16\", \"float32\"\n",
+        "TENSOR_PARALLEL = 1            # single GPU = 1\n",
+        "GPU_MEMORY_UTIL = 0.90         # 0.6–0.95 depending on VRAM\n",
+        "MAX_MODEL_LEN = 8192           # context length (depends on model)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gMjiJkoTmoPA"
+      },
+      "source": [
+        "## 4. Basic model inference\n",
+        "\n",
+        "Load the model with **vLLM** and generate text for one or more prompts using **SamplingParams** (temperature, top_p, max_tokens, etc.)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "D7IXT5FWmoPB"
+      },
+      "outputs": [],
+      "source": [
+        "from vllm import LLM, SamplingParams\n",
+        "\n",
+        "print(\"⏳ Loading model (this may download weights on first run)...\")\n",
+        "llm = LLM(\n",
+        "    model=MODEL_ID,\n",
+        "    dtype=DTYPE,\n",
+        "    tensor_parallel_size=TENSOR_PARALLEL,\n",
+        "    gpu_memory_utilization=GPU_MEMORY_UTIL,\n",
+        "    max_model_len=MAX_MODEL_LEN,\n",
+        ")\n",
+        "print(\"✅ Model loaded!\")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 5. Sample prompts\n",
+        "\n",
+        "Use the customise Let's test the model using sample prompts."
+      ],
+      "metadata": {
+        "id": "yaaCIaOfDILx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Example prompts\n",
+        "prompts = [\n",
+        "    \"You are a helpful assistant. Summarise why efficient attention helps LLM inference.\",\n",
+        "    \"List three creative uses of a 7B model for education.\",\n",
+        "]\n",
+        "\n",
+        "# Sampling parameters\n",
+        "sampling = SamplingParams(\n",
+        "    temperature=0.7,\n",
+        "    top_p=0.9,\n",
+        "    max_tokens=256,\n",
+        ")\n",
+        "\n",
+        "# Generate\n",
+        "outputs = llm.generate(prompts, sampling)\n",
+        "for out in outputs:\n",
+        "    print(\"\\n---\")\n",
+        "    print(\"Prompt:\", out.prompt)\n",
+        "    print(\"Completion:\", out.outputs[0].text.strip())\n"
+      ],
+      "metadata": {
+        "id": "1s_ALheCCwfP"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 6. User Custom Prompt Testing\n",
+        "\n",
+        "You can enter your prompt to test the model's chat capabilities here."
+      ],
+      "metadata": {
+        "id": "kaSLGm0_GL62"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Helper function for quick generation\n",
+        "def generate_text(prompt, temperature=0.7, top_p=0.9, max_tokens=256):\n",
+        "    params = SamplingParams(temperature=temperature, top_p=top_p, max_tokens=max_tokens)\n",
+        "    result = llm.generate([prompt], params)[0].outputs[0].text\n",
+        "    return result.strip()\n",
+        "\n",
+        "print(\"\\nQuick test:\")\n",
+        "new_Prompt = input(\"Enter a prompt: \")\n",
+        "print(generate_text(new_Prompt))\n",
+        "\n",
+        "\n",
+        "# print(generate_text(\"Explain what continuous batching means in vLLM.\"))"
+      ],
+      "metadata": {
+        "id": "AI9CELj5Ej5g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "yJSF_-4FmoPD"
+      },
+      "source": [
+        "## 7. Conclusion\n",
+        "\n",
+        "You have successfully deployed and run a 7B-class Large Language Model using vLLM on Saturn Cloud. This template demonstrates how to perform high-speed inference, interact with your model via prompts, and scale seamlessly across single or multiple GPUs.\n",
+        "\n",
+        "\n",
+        "By using [Saturn Cloud’s GPU infrastructure](https://saturncloud.io/docs/user-guide/how-to/resources/), you can easily extend this workflow for larger models, API serving, or integrated data science pipelines — all within a managed, scalable environment designed for production-grade AI workloads. Visit [saturn cloud](https://saturncloud.io/) to easily deploy this model."
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.13.7",
+      "mimetype": "text/x-python",
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "pygments_lexer": "ipython3",
+      "nbconvert_exporter": "python",
+      "file_extension": ".py"
+    },
+    "colab": {
+      "provenance": [],
+      "gpuType": "A100"
+    },
+    "accelerator": "GPU"
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}