ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/00_overview.md
Lines changed: 11 additions & 20 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/00_overview.md
Lines changed: 11 additions & 20 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an_axion_instance.md
Lines changed: 31 additions & 29 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an_axion_instance.md
Lines changed: 31 additions & 29 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md
Lines changed: 22 additions & 29 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md
Lines changed: 22 additions & 29 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md
Lines changed: 35 additions & 38 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md
Lines changed: 35 additions & 38 deletions
@@ -1,37 +1,28 @@
 ---
-title: Overview
+title: AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
 weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## The AFM-4.5B model
+## AFM-4.5B model and deployment workflow
 
 [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) is a 4.5-billion-parameter foundation model designed to balance accuracy, efficiency, and broad language coverage. Trained on nearly 8 trillion tokens of carefully filtered data, it performs well across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.
 
-In this Learning Path, you'll deploy [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) using [Llama.cpp](https://github.com/ggerganov/llama.cpp) on an Arm-based Google Cloud Axion instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. You'll also evaluate model quality using perplexity, a common metric for measuring how well a language model predicts text.
+In this Learning Path, you’ll deploy [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) using [Llama.cpp](https://github.com/ggerganov/llama.cpp) on a Google Cloud Axion Arm64 instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. You’ll also evaluate model quality using perplexity, a standard metric for how well a language model predicts text.
 
 This hands-on guide helps developers build cost-efficient, high-performance LLM applications on modern Arm server infrastructure using open-source tools and real-world deployment practices.
 
-### LLM deployment workflow on Google Axion
+### Deployment workflow for AFM-4.5B on Google Cloud Axion
 
-- **Provision compute**: launch a Google Cloud instance using an Axion-based instance type (for example, `c4a-standard-16`)
+- **Provision compute**: launch a Google Cloud instance using an Axion-based instance type (for example, `c4a-standard-16`)  
+- **Set up your environment**: install build tools and dependencies (CMake, Python, Git)  
+- **Build the inference engine**: clone the [Llama.cpp](https://github.com/ggerganov/llama.cpp) repository and compile the project for your Arm-based environment  
+- **Prepare the model**: download the AFM-4.5B model files from Hugging Face and use Llama.cpp’s quantization tools to reduce model size and optimize performance  
+- **Run inference**: load the quantized model and run sample prompts using Llama.cpp 
+- **Evaluate model quality**: calculate perplexity or use other metrics to assess performance  
 
-- **Set up your environment**: install the required build tools and dependencies (such as CMake, Python, and Git)
-
-- **Build the inference engine**: clone the [Llama.cpp](https://github.com/ggerganov/llama.cpp) repository and compile the project for your Arm-based environment
-
-- **Prepare the model**: download the **AFM-4.5B** model files from Hugging Face and use Llama.cpp's quantization tools to reduce model size and optimize performance
-
-- **Run inference**: load the quantized model and run sample prompts using Llama.cpp.
-
-- **Evaluate model quality**: calculate **perplexity** or use other metrics to assess model performance
-
-{{< notice Note>}}
+{{< notice Note >}}
 You can reuse this deployment flow with other models supported by Llama.cpp by swapping out the model file and adjusting quantization settings.
 {{< /notice >}}
-
-
-
-
@@ -1,5 +1,5 @@
 ---
-title: Provision your Axion environment
+title: Provision a Google Cloud Axion Arm64 environment
 weight: 3
 
 ### FIXED, DO NOT MODIFY
@@ -8,57 +8,59 @@ layout: learningpathall
 
 ## Requirements
 
-Before you begin, make sure you have the following:
+Before you begin, make sure you meet the following requirements:
 
 - A Google Cloud account  
-- Permission to launch a Google Axion instance of type `c4a-standard-16` (or larger)  
-- At least 128 GB of available storage
+- Permission to launch a Google Cloud Axion instance of type `c4a-standard-16` (or larger)  
+- At least 128 GB of available storage  
 
-If you're new to Google Cloud, check out the Learning Path [Getting Started with Google Cloud](/learning-paths/servers-and-cloud-computing/csp/google/).
+If you're new to Google Cloud, see the Learning Path [Getting started with Google Cloud](/learning-paths/servers-and-cloud-computing/csp/google/).
 
-## Launch and configure the Compute Engine instance
+## Requirements for Google Cloud Axion
 
-In the left sidebar of the [Compute Engine dashboard](https://console.cloud.google.com/compute), select **VM instances**, and then **Create instance**.
+Confirm that your account has sufficient quota for Axion instances and enough storage capacity to host the AFM-4.5B model and dependencies.
 
-Use the following settings to configure your instance:
+## Launch and configure a Google Cloud Axion VM
 
-- **Name**: `arcee-axion-instance`  
-- **Region** and **Zone**: the region and zone where you have access to c4a instances
-- Select **General purpose**, then click **C4A**
-- **Machine type**: c4a-standard-16 or larger
+In the left sidebar of the [Compute Engine dashboard](https://console.cloud.google.com/compute), select **VM instances**, and then **Create instance**.
 
-## Configure OS and Storage
+Use the following settings:
 
-In the left sidebar, select **OS and storage**.
+- **Name**: `arcee-axion-instance`  
+- **Region** and **Zone**: the region and zone where you have access to `c4a` instances  
+- **Machine family**: select **General purpose**, then **C4A**  
+- **Machine type**: `c4a-standard-16` or larger  
 
-Under **Operating system and storage**, click on **Change**
+## Configure operating system and storage
 
-Select Ubuntu as the Operating system. For version select Ubuntu 24.04 LTS Minimal.
+In the left sidebar, select **OS and storage**.  
 
-Set the size of the disk to 128 GB, then click on **Select**.
+- Under **Operating system and storage**, click **Change**  
+- Select **Ubuntu 24.04 LTS Minimal** as the OS  
+- Set the disk size to **128 GB**  
+- Click **Select**  
 
-## Review and launch the instance
+## Review and create your Axion instance
 
-Leave the other settings as they are.
+Leave the other settings as they are.  
 
-When you're ready, click on **Create** to create your Compute Engine instance.
+When you’re ready, click **Create** to launch your Compute Engine instance.
 
-## Monitor the instance launch
+## Verify instance launch
 
-After a few seconds, you should see that your instance is ready.
+After a few seconds, you should see your instance listed as **Running**.  
 
 If the launch fails, double-check your settings and permissions, and try again.
 
-## Connect to your instance
+## Connect to your Google Cloud Axion VM
 
-Open the **SSH** dropdown list, and select **Open in browser window**.
+Open the **SSH** dropdown list, and select **Open in browser window**.  
 
-Your browser may ask you to authenticate. Once you've done that, a terminal window will open.
+Your browser may ask you to authenticate. Once you’ve done that, a terminal window will open.  
 
-You are now connected to your Ubuntu instance running on Axion.
+You are now connected to your Ubuntu instance running on Google Cloud Axion.
 
 {{% notice Note %}}
-**Region**: make sure you're launching in your preferred Google Cloud region.  
-**Storage**: 128 GB is sufficient for the AFM-4.5B model and dependencies.  
+- **Region**: make sure you're launching in your preferred Google Cloud region.  
+- **Storage**: 128 GB is sufficient for the AFM-4.5B model and dependencies.  
 {{% /notice %}}
-
 
@@ -1,58 +1,51 @@
 ---
-title: Configure your Axion environment
+title: Configure your Google Cloud Axion Arm64 environment
 weight: 4
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-In this step, you'll set up the Axion instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
+In this step, you’ll configure your Google Cloud Axion Arm64 instance with the system packages and Python environment required to build and run the Arcee Foundation Model using Llama.cpp.
 
-## Update the package list
+## Update package lists
 
 Run the following command to update your local APT package index:
 
 ```bash
 sudo apt-get update
 ```
 
-This step ensures you have the most recent metadata about available packages, including versions and dependencies. It helps prevent conflicts when installing new packages.
+This ensures you have the most recent metadata about available packages, versions, and dependencies, helping to prevent conflicts when installing new software.
 
-## Install system dependencies
+## Install build tools and Python dependencies
 
-Install the build tools and Python environment:
+Install the required build tools and Python environment:
 
 ```bash
 sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
 ```
 
-This command installs the following tools and dependencies:
+This command installs the following:
 
-- **CMake**: cross-platform build system generator used to compile and build Llama.cpp
-
-- **GCC and G++**: GNU C and C++ compilers for compiling native code
-
-- **Git**: version control system for cloning repositories
-
-- **Python 3**: Python interpreter for running Python-based tools and scripts
-
-- **Pip**: Python package manager
-
-- **Virtualenv**: tool for creating isolated Python environments
-
-- **libcurl4-openssl-dev**: development files for the curl HTTP library
-
-- **Unzip**: tool to extract `.zip` files (used in some model downloads)
+- **CMake**: build system generator used to compile Llama.cpp  
+- **GCC and G++**: GNU C and C++ compilers for native code  
+- **Git**: version control system for cloning repositories  
+- **Python 3**: Python interpreter for running tools and scripts  
+- **Pip**: Python package manager  
+- **Virtualenv**: tool for creating isolated Python environments  
+- **libcurl4-openssl-dev**: development files for the curl HTTP library  
+- **Unzip**: utility to extract `.zip` files  
 
 The `-y` flag automatically approves the installation of all packages without prompting.
 
-## Ready for build and deployment
+## Verify environment setup
 
-After completing the setup, your instance includes the following tools and environments:
+After completing these steps, your instance includes:
 
-- A complete C/C++ development environment for building Llama.cpp
-- Python 3, pip, and virtualenv for managing Python tools and environments
-- Git for cloning repositories
-- All required dependencies for compiling optimized Arm64 binaries
+- A complete C/C++ development environment for building Llama.cpp  
+- Python 3, pip, and virtualenv for managing Python tools  
+- Git for cloning repositories  
+- All required dependencies for compiling optimized Arm64 binaries  
 
-You're now ready to build Llama.cpp and download the Arcee Foundation Model.
+You’re now ready to build Llama.cpp and download the Arcee Foundation Model.
@@ -1,46 +1,46 @@
 ---
-title: Build Llama.cpp
+title: Build Llama.cpp on Google Cloud Axion Arm64
 weight: 5
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
-## Build the Llama.cpp inference engine
 
-In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like Google Axion.
+## Build the Llama.cpp inference engine on Google Cloud Axion
 
-Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
+In this step, you’ll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on multiple hardware platforms, including Arm64 processors such as Google Cloud Axion.
 
-## Clone the repository
+Although AFM-4.5B uses a custom architecture, you can use the standard Llama.cpp repository. Arcee AI has contributed the required modeling code upstream.
+
+## Clone the Llama.cpp repository
 
 ```bash
 git clone https://github.com/ggerganov/llama.cpp
 ```
 
-This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.
+This command clones the Llama.cpp repository from GitHub. The repository includes source code, build scripts, and documentation.
 
-## Navigate to the project directory
+## Navigate to the Llama.cpp directory
 
 ```bash
 cd llama.cpp
 ```
 
-Change into the llama.cpp directory to run the build process. This directory contains the `CMakeLists.txt` file and all source code.
+Move into the `llama.cpp` directory to run the build process. This directory contains the `CMakeLists.txt` file and all source code.
 
-## Configure the build with CMake
+## Configure the build with CMake for Arm64
 
 ```bash
 cmake -B .
 ```
 
-This command configures the build system using CMake:
-
-- `-B .` tells CMake to generate build files in the current directory
-- CMake detects your system's compiler, libraries, and hardware capabilities
-- It produces Makefiles (on Linux) or platform-specific build scripts for compiling the project
+This configures the build system using CMake:
 
+- `-B .` generates build files in the current directory  
+- CMake detects the system compiler, libraries, and hardware capabilities  
+- It produces Makefiles (Linux) or platform-specific scripts for compilation  
 
-If you're running on Axion, the CMake output should include hardware-specific optimizations targeting the Neoverse V2 architecture. These optimizations are crucial for achieving high performance on Axion:
+On Google Cloud Axion, the output should show hardware-specific optimizations for the Neoverse V2 architecture:
 
 ```output
 -- ARM feature DOTPROD enabled
@@ -51,40 +51,37 @@ If you're running on Axion, the CMake output should include hardware-specific op
 -- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
 ```
 
-These features enable advanced CPU instructions that accelerate inference performance on Arm64:
-
-- **DOTPROD: Dot Product**: hardware-accelerated dot product operations for neural network workloads
-
-- **SVE (Scalable Vector Extension)**: advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
+These optimizations enable advanced Arm64 CPU instructions:
 
-- **MATMUL_INT8**: integer matrix multiplication units optimized for transformers
-
-- **FMA**: fused multiply-add operations to speed up floating-point math
-
-- **FP16 vector arithmetic**: 16-bit floating-point vector operations to reduce memory use without compromising precision
+- **DOTPROD**: hardware-accelerated dot product operations  
+- **SVE (Scalable Vector Extension)**: advanced vector processing for large-scale matrix operations  
+- **MATMUL_INT8**: optimized integer matrix multiplication for transformers  
+- **FMA**: fused multiply-add for faster floating-point math  
+- **FP16 vector arithmetic**: reduced memory use with half-precision floats  
 
 ## Compile the project
 
 ```bash
 cmake --build . --config Release -j16
 ```
 
-This command compiles the Llama.cpp source code:
+This compiles Llama.cpp with the following options:
+
+- `--build .` builds in the current directory  
+- `--config Release` enables compiler optimizations  
+- `-j16` runs 16 parallel jobs for faster compilation on multi-core Axion systems  
 
-- `--build .` tells CMake to build the project in the current directory
-- `--config Release` enables optimizations and strips debug symbols
-- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Axion.
+The build produces Arm64-optimized binaries in under a minute.
 
-The build process compiles the C++ source code into executable binaries optimized for the Arm64 architecture. Compilation typically takes under a minute.
+## Key Llama.cpp binaries after compilation
 
-## Key binaries after compilation 
+After compilation, you’ll find key tools in the `bin` directory:
 
-After compilation, you'll find several key command-line tools in the `bin` directory:
-- `llama-cli`: the main inference executable for running LLaMA models
-- `llama-server`: a web server for serving model inference over HTTP
-- `llama-quantize`: a tool for model quantization to reduce memory usage
-- Additional utilities for model conversion and optimization
+- `llama-cli`: main inference executable  
+- `llama-server`: HTTP server for model inference  
+- `llama-quantize`: tool for quantization to reduce memory usage  
+- Additional utilities for model conversion and optimization  
 
-You can find more tools and usage details in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
+See the [Llama.cpp GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools) for details.
 
-These binaries are specifically optimized for the Arm architecture and will provide excellent performance on your Axion instance.
+These binaries are optimized for Arm64 and provide excellent performance on Google Cloud Axion.