Skip to content

Commit f4cf235

Browse files
authored
Merge pull request #2248 from madeline-underwood/arcee_axion
Arcee axion_PV to sign off
2 parents a866b5e + 4324ba0 commit f4cf235

10 files changed

+215
-232
lines changed
Lines changed: 11 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,28 @@
11
---
2-
title: Overview
2+
title: AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## The AFM-4.5B model
9+
## AFM-4.5B model and deployment workflow
1010

1111
[AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) is a 4.5-billion-parameter foundation model designed to balance accuracy, efficiency, and broad language coverage. Trained on nearly 8 trillion tokens of carefully filtered data, it performs well across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.
1212

13-
In this Learning Path, you'll deploy [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) using [Llama.cpp](https://github.com/ggerganov/llama.cpp) on an Arm-based Google Cloud Axion instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. You'll also evaluate model quality using perplexity, a common metric for measuring how well a language model predicts text.
13+
In this Learning Path, youll deploy [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) using [Llama.cpp](https://github.com/ggerganov/llama.cpp) on a Google Cloud Axion Arm64 instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. Youll also evaluate model quality using perplexity, a standard metric for how well a language model predicts text.
1414

1515
This hands-on guide helps developers build cost-efficient, high-performance LLM applications on modern Arm server infrastructure using open-source tools and real-world deployment practices.
1616

17-
### LLM deployment workflow on Google Axion
17+
### Deployment workflow for AFM-4.5B on Google Cloud Axion
1818

19-
- **Provision compute**: launch a Google Cloud instance using an Axion-based instance type (for example, `c4a-standard-16`)
19+
- **Provision compute**: launch a Google Cloud instance using an Axion-based instance type (for example, `c4a-standard-16`)
20+
- **Set up your environment**: install build tools and dependencies (CMake, Python, Git)
21+
- **Build the inference engine**: clone the [Llama.cpp](https://github.com/ggerganov/llama.cpp) repository and compile the project for your Arm-based environment
22+
- **Prepare the model**: download the AFM-4.5B model files from Hugging Face and use Llama.cpp’s quantization tools to reduce model size and optimize performance
23+
- **Run inference**: load the quantized model and run sample prompts using Llama.cpp
24+
- **Evaluate model quality**: calculate perplexity or use other metrics to assess performance
2025

21-
- **Set up your environment**: install the required build tools and dependencies (such as CMake, Python, and Git)
22-
23-
- **Build the inference engine**: clone the [Llama.cpp](https://github.com/ggerganov/llama.cpp) repository and compile the project for your Arm-based environment
24-
25-
- **Prepare the model**: download the **AFM-4.5B** model files from Hugging Face and use Llama.cpp's quantization tools to reduce model size and optimize performance
26-
27-
- **Run inference**: load the quantized model and run sample prompts using Llama.cpp.
28-
29-
- **Evaluate model quality**: calculate **perplexity** or use other metrics to assess model performance
30-
31-
{{< notice Note>}}
26+
{{< notice Note >}}
3227
You can reuse this deployment flow with other models supported by Llama.cpp by swapping out the model file and adjusting quantization settings.
3328
{{< /notice >}}
34-
35-
36-
37-

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an_axion_instance.md

Lines changed: 31 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Provision your Axion environment
2+
title: Provision a Google Cloud Axion Arm64 environment
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
@@ -8,57 +8,59 @@ layout: learningpathall
88

99
## Requirements
1010

11-
Before you begin, make sure you have the following:
11+
Before you begin, make sure you meet the following requirements:
1212

1313
- A Google Cloud account
14-
- Permission to launch a Google Axion instance of type `c4a-standard-16` (or larger)
15-
- At least 128 GB of available storage
14+
- Permission to launch a Google Cloud Axion instance of type `c4a-standard-16` (or larger)
15+
- At least 128 GB of available storage
1616

17-
If you're new to Google Cloud, check out the Learning Path [Getting Started with Google Cloud](/learning-paths/servers-and-cloud-computing/csp/google/).
17+
If you're new to Google Cloud, see the Learning Path [Getting started with Google Cloud](/learning-paths/servers-and-cloud-computing/csp/google/).
1818

19-
## Launch and configure the Compute Engine instance
19+
## Requirements for Google Cloud Axion
2020

21-
In the left sidebar of the [Compute Engine dashboard](https://console.cloud.google.com/compute), select **VM instances**, and then **Create instance**.
21+
Confirm that your account has sufficient quota for Axion instances and enough storage capacity to host the AFM-4.5B model and dependencies.
2222

23-
Use the following settings to configure your instance:
23+
## Launch and configure a Google Cloud Axion VM
2424

25-
- **Name**: `arcee-axion-instance`
26-
- **Region** and **Zone**: the region and zone where you have access to c4a instances
27-
- Select **General purpose**, then click **C4A**
28-
- **Machine type**: c4a-standard-16 or larger
25+
In the left sidebar of the [Compute Engine dashboard](https://console.cloud.google.com/compute), select **VM instances**, and then **Create instance**.
2926

30-
## Configure OS and Storage
27+
Use the following settings:
3128

32-
In the left sidebar, select **OS and storage**.
29+
- **Name**: `arcee-axion-instance`
30+
- **Region** and **Zone**: the region and zone where you have access to `c4a` instances
31+
- **Machine family**: select **General purpose**, then **C4A**
32+
- **Machine type**: `c4a-standard-16` or larger
3333

34-
Under **Operating system and storage**, click on **Change**
34+
## Configure operating system and storage
3535

36-
Select Ubuntu as the Operating system. For version select Ubuntu 24.04 LTS Minimal.
36+
In the left sidebar, select **OS and storage**.
3737

38-
Set the size of the disk to 128 GB, then click on **Select**.
38+
- Under **Operating system and storage**, click **Change**
39+
- Select **Ubuntu 24.04 LTS Minimal** as the OS
40+
- Set the disk size to **128 GB**
41+
- Click **Select**
3942

40-
## Review and launch the instance
43+
## Review and create your Axion instance
4144

42-
Leave the other settings as they are.
45+
Leave the other settings as they are.
4346

44-
When you're ready, click on **Create** to create your Compute Engine instance.
47+
When youre ready, click **Create** to launch your Compute Engine instance.
4548

46-
## Monitor the instance launch
49+
## Verify instance launch
4750

48-
After a few seconds, you should see that your instance is ready.
51+
After a few seconds, you should see your instance listed as **Running**.
4952

5053
If the launch fails, double-check your settings and permissions, and try again.
5154

52-
## Connect to your instance
55+
## Connect to your Google Cloud Axion VM
5356

54-
Open the **SSH** dropdown list, and select **Open in browser window**.
57+
Open the **SSH** dropdown list, and select **Open in browser window**.
5558

56-
Your browser may ask you to authenticate. Once you've done that, a terminal window will open.
59+
Your browser may ask you to authenticate. Once youve done that, a terminal window will open.
5760

58-
You are now connected to your Ubuntu instance running on Axion.
61+
You are now connected to your Ubuntu instance running on Google Cloud Axion.
5962

6063
{{% notice Note %}}
61-
**Region**: make sure you're launching in your preferred Google Cloud region.
62-
**Storage**: 128 GB is sufficient for the AFM-4.5B model and dependencies.
64+
- **Region**: make sure you're launching in your preferred Google Cloud region.
65+
- **Storage**: 128 GB is sufficient for the AFM-4.5B model and dependencies.
6366
{{% /notice %}}
64-
Lines changed: 22 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,51 @@
11
---
2-
title: Configure your Axion environment
2+
title: Configure your Google Cloud Axion Arm64 environment
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
In this step, you'll set up the Axion instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
9+
In this step, youll configure your Google Cloud Axion Arm64 instance with the system packages and Python environment required to build and run the Arcee Foundation Model using Llama.cpp.
1010

11-
## Update the package list
11+
## Update package lists
1212

1313
Run the following command to update your local APT package index:
1414

1515
```bash
1616
sudo apt-get update
1717
```
1818

19-
This step ensures you have the most recent metadata about available packages, including versions and dependencies. It helps prevent conflicts when installing new packages.
19+
This ensures you have the most recent metadata about available packages, versions, and dependencies, helping to prevent conflicts when installing new software.
2020

21-
## Install system dependencies
21+
## Install build tools and Python dependencies
2222

23-
Install the build tools and Python environment:
23+
Install the required build tools and Python environment:
2424

2525
```bash
2626
sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
2727
```
2828

29-
This command installs the following tools and dependencies:
29+
This command installs the following:
3030

31-
- **CMake**: cross-platform build system generator used to compile and build Llama.cpp
32-
33-
- **GCC and G++**: GNU C and C++ compilers for compiling native code
34-
35-
- **Git**: version control system for cloning repositories
36-
37-
- **Python 3**: Python interpreter for running Python-based tools and scripts
38-
39-
- **Pip**: Python package manager
40-
41-
- **Virtualenv**: tool for creating isolated Python environments
42-
43-
- **libcurl4-openssl-dev**: development files for the curl HTTP library
44-
45-
- **Unzip**: tool to extract `.zip` files (used in some model downloads)
31+
- **CMake**: build system generator used to compile Llama.cpp
32+
- **GCC and G++**: GNU C and C++ compilers for native code
33+
- **Git**: version control system for cloning repositories
34+
- **Python 3**: Python interpreter for running tools and scripts
35+
- **Pip**: Python package manager
36+
- **Virtualenv**: tool for creating isolated Python environments
37+
- **libcurl4-openssl-dev**: development files for the curl HTTP library
38+
- **Unzip**: utility to extract `.zip` files
4639

4740
The `-y` flag automatically approves the installation of all packages without prompting.
4841

49-
## Ready for build and deployment
42+
## Verify environment setup
5043

51-
After completing the setup, your instance includes the following tools and environments:
44+
After completing these steps, your instance includes:
5245

53-
- A complete C/C++ development environment for building Llama.cpp
54-
- Python 3, pip, and virtualenv for managing Python tools and environments
55-
- Git for cloning repositories
56-
- All required dependencies for compiling optimized Arm64 binaries
46+
- A complete C/C++ development environment for building Llama.cpp
47+
- Python 3, pip, and virtualenv for managing Python tools
48+
- Git for cloning repositories
49+
- All required dependencies for compiling optimized Arm64 binaries
5750

58-
You're now ready to build Llama.cpp and download the Arcee Foundation Model.
51+
Youre now ready to build Llama.cpp and download the Arcee Foundation Model.
Lines changed: 35 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,46 @@
11
---
2-
title: Build Llama.cpp
2+
title: Build Llama.cpp on Google Cloud Axion Arm64
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8-
## Build the Llama.cpp inference engine
98

10-
In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like Google Axion.
9+
## Build the Llama.cpp inference engine on Google Cloud Axion
1110

12-
Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
11+
In this step, you’ll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on multiple hardware platforms, including Arm64 processors such as Google Cloud Axion.
1312

14-
## Clone the repository
13+
Although AFM-4.5B uses a custom architecture, you can use the standard Llama.cpp repository. Arcee AI has contributed the required modeling code upstream.
14+
15+
## Clone the Llama.cpp repository
1516

1617
```bash
1718
git clone https://github.com/ggerganov/llama.cpp
1819
```
1920

20-
This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.
21+
This command clones the Llama.cpp repository from GitHub. The repository includes source code, build scripts, and documentation.
2122

22-
## Navigate to the project directory
23+
## Navigate to the Llama.cpp directory
2324

2425
```bash
2526
cd llama.cpp
2627
```
2728

28-
Change into the llama.cpp directory to run the build process. This directory contains the `CMakeLists.txt` file and all source code.
29+
Move into the `llama.cpp` directory to run the build process. This directory contains the `CMakeLists.txt` file and all source code.
2930

30-
## Configure the build with CMake
31+
## Configure the build with CMake for Arm64
3132

3233
```bash
3334
cmake -B .
3435
```
3536

36-
This command configures the build system using CMake:
37-
38-
- `-B .` tells CMake to generate build files in the current directory
39-
- CMake detects your system's compiler, libraries, and hardware capabilities
40-
- It produces Makefiles (on Linux) or platform-specific build scripts for compiling the project
37+
This configures the build system using CMake:
4138

39+
- `-B .` generates build files in the current directory
40+
- CMake detects the system compiler, libraries, and hardware capabilities
41+
- It produces Makefiles (Linux) or platform-specific scripts for compilation
4242

43-
If you're running on Axion, the CMake output should include hardware-specific optimizations targeting the Neoverse V2 architecture. These optimizations are crucial for achieving high performance on Axion:
43+
On Google Cloud Axion, the output should show hardware-specific optimizations for the Neoverse V2 architecture:
4444

4545
```output
4646
-- ARM feature DOTPROD enabled
@@ -51,40 +51,37 @@ If you're running on Axion, the CMake output should include hardware-specific op
5151
-- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
5252
```
5353

54-
These features enable advanced CPU instructions that accelerate inference performance on Arm64:
55-
56-
- **DOTPROD: Dot Product**: hardware-accelerated dot product operations for neural network workloads
57-
58-
- **SVE (Scalable Vector Extension)**: advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
54+
These optimizations enable advanced Arm64 CPU instructions:
5955

60-
- **MATMUL_INT8**: integer matrix multiplication units optimized for transformers
61-
62-
- **FMA**: fused multiply-add operations to speed up floating-point math
63-
64-
- **FP16 vector arithmetic**: 16-bit floating-point vector operations to reduce memory use without compromising precision
56+
- **DOTPROD**: hardware-accelerated dot product operations
57+
- **SVE (Scalable Vector Extension)**: advanced vector processing for large-scale matrix operations
58+
- **MATMUL_INT8**: optimized integer matrix multiplication for transformers
59+
- **FMA**: fused multiply-add for faster floating-point math
60+
- **FP16 vector arithmetic**: reduced memory use with half-precision floats
6561

6662
## Compile the project
6763

6864
```bash
6965
cmake --build . --config Release -j16
7066
```
7167

72-
This command compiles the Llama.cpp source code:
68+
This compiles Llama.cpp with the following options:
69+
70+
- `--build .` builds in the current directory
71+
- `--config Release` enables compiler optimizations
72+
- `-j16` runs 16 parallel jobs for faster compilation on multi-core Axion systems
7373

74-
- `--build .` tells CMake to build the project in the current directory
75-
- `--config Release` enables optimizations and strips debug symbols
76-
- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Axion.
74+
The build produces Arm64-optimized binaries in under a minute.
7775

78-
The build process compiles the C++ source code into executable binaries optimized for the Arm64 architecture. Compilation typically takes under a minute.
76+
## Key Llama.cpp binaries after compilation
7977

80-
## Key binaries after compilation
78+
After compilation, you’ll find key tools in the `bin` directory:
8179

82-
After compilation, you'll find several key command-line tools in the `bin` directory:
83-
- `llama-cli`: the main inference executable for running LLaMA models
84-
- `llama-server`: a web server for serving model inference over HTTP
85-
- `llama-quantize`: a tool for model quantization to reduce memory usage
86-
- Additional utilities for model conversion and optimization
80+
- `llama-cli`: main inference executable
81+
- `llama-server`: HTTP server for model inference
82+
- `llama-quantize`: tool for quantization to reduce memory usage
83+
- Additional utilities for model conversion and optimization
8784

88-
You can find more tools and usage details in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
85+
See the [Llama.cpp GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools) for details.
8986

90-
These binaries are specifically optimized for the Arm architecture and will provide excellent performance on your Axion instance.
87+
These binaries are optimized for Arm64 and provide excellent performance on Google Cloud Axion.

0 commit comments

Comments
 (0)