-
Notifications
You must be signed in to change notification settings - Fork 223
LPs on Halide #1814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dawidborycki
wants to merge
9
commits into
ArmDeveloperEcosystem:main
Choose a base branch
from
dawidborycki:LP-Halide
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
LPs on Halide #1814
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ae6be9d
Halide: Intro
dawidborycki 030985b
Lesson 2
dawidborycki 6808521
Create fusion.md
dawidborycki 7148683
AOT
dawidborycki 06d4a2d
Android
dawidborycki dd0a3ba
Addressing comments
dawidborycki 0a9355e
Addressing comments
dawidborycki 043ced9
Addressing comments
dawidborycki 92924b5
2nd round
dawidborycki File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file added
BIN
+1.17 MB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+1.17 MB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+294 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/03.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+690 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/04.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+535 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/05.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+150 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/06.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+651 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/07.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+229 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/08.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+617 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/09.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+456 KB
content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions
52
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
--- | ||
title: Halide Essentials. From Basics to Android Integration | ||
minutes_to_complete: 180 | ||
|
||
who_is_this_for: This is an introductory topic for software developers interested in learning how to use Halide for image processing. | ||
|
||
learning_objectives: | ||
- Understand foundational concepts of Halide and set up your development environment. | ||
- Create a basic real-time image processing pipeline using Halide. | ||
- Optimize image processing workflows by applying operation fusion in Halide. | ||
- Integrate Halide pipelines into Android applications developed with Kotlin. | ||
|
||
prerequisites: | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Basic C++ knowledge | ||
- Basic programming knowledge | ||
- Android Studio with Android Emulator | ||
|
||
author: Dawid Borycki | ||
|
||
### Tags | ||
skilllevels: Introductory | ||
subjects: Performance and Architecture | ||
pareenaverma marked this conversation as resolved.
Show resolved
Hide resolved
|
||
armips: | ||
- Cortex-A | ||
- Cortex-X | ||
operatingsystems: | ||
- Android | ||
tools_software_languages: | ||
- Android Studio | ||
- Coding | ||
|
||
further_reading: | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- resource: | ||
title: Halide 19.0.0 | ||
link: https://halide-lang.org/docs/index.html | ||
type: website | ||
- resource: | ||
title: Halide GitHub | ||
link: https://github.com/halide/Halide | ||
type: repository | ||
- resource: | ||
title: Halide Tutorials | ||
link: https://halide-lang.org/tutorials/ | ||
type: website | ||
|
||
|
||
### FIXED, DO NOT MODIFY | ||
# ================================================================================ | ||
weight: 1 # _index.md always has weight of 1 to order correctly | ||
layout: "learningpathall" # All files under learning paths have this same wrapper | ||
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. | ||
--- |
8 changes: 8 additions & 0 deletions
8
content/learning-paths/mobile-graphics-and-gaming/android_halide/_next-steps.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
# ================================================================================ | ||
# FIXED, DO NOT MODIFY THIS FILE | ||
# ================================================================================ | ||
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. | ||
title: "Next Steps" # Always the same, html page title. | ||
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. | ||
--- |
454 changes: 454 additions & 0 deletions
454
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Large diffs are not rendered by default.
Oops, something went wrong.
162 changes: 162 additions & 0 deletions
162
...ng-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
--- | ||
# User change | ||
title: "Ahead-of-time and cross-compilation" | ||
|
||
weight: 5 | ||
|
||
layout: "learningpathall" | ||
--- | ||
|
||
## Ahead-of-time and cross-compilation | ||
One of Halide’s standout features is the ability to compile image processing pipelines ahead-of-time (AOT), enabling developers to generate optimized binary code on their host machines rather than compiling directly on target devices. This AOT compilation process allows developers to create highly efficient libraries that run effectively across diverse hardware without incurring the runtime overhead associated with just-in-time (JIT) compilation. | ||
|
||
Halide also supports robust cross-compilation capabilities. Cross-compilation means using the host version of Halide, typically running on a desktop Linux or macOS system—to target different architectures, such as ARM for Android devices. Developers can thus optimize Halide pipelines on their host machine, produce libraries specifically optimized for Android, and integrate them seamlessly into Android applications. The generated pipeline code includes essential optimizations and can embed minimal runtime support, further reducing workload on the target device and ensuring responsiveness and efficiency. | ||
|
||
## Objective | ||
In this section, we leverage the host version of Halide to perform AOT compilation of an image processing pipeline via cross-compilation. The resulting pipeline library is specifically tailored to Android devices (targeting, for instance, arm64-v8a ABI), while the compilation itself occurs entirely on the host system. This approach significantly accelerates development by eliminating the need to build Halide or perform JIT compilation on Android devices. It also guarantees that the resulting binaries are optimized for the intended hardware, streamlining the deployment of high-performance image processing applications on mobile platforms. | ||
|
||
## Prepare Pipeline for Android | ||
The procedure implemented in the following code demonstrates how Halide’s AOT compilation and cross-compilation features can be utilized to create an optimized image processing pipeline for Android. We will run Halide on our host machine (in this example, macOS) to generate a static library containing the pipeline function, which will later be invoked from an Android device. Below is a step-by-step explanation of this process. | ||
|
||
Create a new file named blur-android.cpp with the following contents: | ||
|
||
```cpp | ||
#include "Halide.h" | ||
#include <iostream> | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#include <string> // for std::string | ||
#include <cstdint> // for fixed-width integer types (e.g., uint8_t) | ||
using namespace Halide; | ||
|
||
int main(int argc, char** argv) { | ||
if (argc < 2) { | ||
std::cerr << "Usage: " << argv[0] << " <output_basename> \n"; | ||
return 1; | ||
} | ||
|
||
std::string output_basename = argv[1]; | ||
|
||
// Configure Halide Target for Android | ||
Halide::Target target; | ||
target.os = Halide::Target::OS::Android; | ||
target.arch = Halide::Target::Arch::ARM; | ||
target.bits = 64; | ||
target.set_feature(Target::NoRuntime, false); | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// --- Define the pipeline --- | ||
// Define variables | ||
Var x("x"), y("y"); | ||
|
||
// Define input parameter | ||
ImageParam input(UInt(8), 2, "input"); | ||
|
||
// Create a clamped function that limits the access to within the image bounds | ||
Func clamped = Halide::BoundaryConditions::repeat_edge(input); | ||
|
||
// Now use the clamped function in processing | ||
RDom r(0, 3, 0, 3); | ||
Func blur("blur"); | ||
|
||
// Initialize blur accumulation | ||
blur(x, y) = cast<uint16_t>(0); | ||
blur(x, y) += cast<uint16_t>(clamped(x + r.x - 1, y + r.y - 1)); | ||
|
||
// Then continue with pipeline | ||
Func blur_div("blur_div"); | ||
blur_div(x, y) = cast<uint8_t>(blur(x, y) / 9); | ||
|
||
// Thresholding | ||
Func thresholded("thresholded"); | ||
Expr t = cast<uint8_t>(128); | ||
thresholded(x, y) = select(blur_div(x, y) > t, cast<uint8_t>(255), cast<uint8_t>(0)); | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// Simple scheduling | ||
blur_div.compute_root(); | ||
thresholded.compute_root(); | ||
|
||
// --- AOT compile to a file --- | ||
thresholded.compile_to_static_library( | ||
output_basename, // base filename | ||
{ input }, // list of inputs | ||
"blur_threshold", // name of the generated function | ||
target | ||
); | ||
|
||
return 0; | ||
} | ||
``` | ||
|
||
In the original implementation constants 128, 255, and 0 were implicitly treated as integers. Here, the threshold value (128) and output values (255, 0) are explicitly cast to uint8_t. This approach removes ambiguity and clearly specifies the types used, ensuring compatibility and clarity. Both approaches result in identical functionality, but explicitly casting helps emphasize the type correctness and may avoid subtle issues during cross-compilation or in certain environments. | ||
|
||
The program takes at least one command-line argument, the output base name used to generate the files (e.g., “blur_threshold_android”). Here, the target architecture is explicitly set within the code to Android ARM64: | ||
|
||
```cpp | ||
// Configure Halide Target for Android | ||
Halide::Target target; | ||
target.os = Halide::Target::OS::Android; | ||
target.arch = Halide::Target::Arch::ARM; | ||
target.bits = 64; | ||
|
||
// Enable Halide runtime inclusion in the generated library (needed if not linking Halide runtime separately). | ||
target.set_feature(Target::NoRuntime, false); | ||
|
||
// Optionally, enable hardware-specific optimizations to improve performance on ARM devices: | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// - DotProd: Optimizes matrix multiplication and convolution-like operations on ARM. | ||
// - ARMFp16 (half-precision floating-point operations). | ||
``` | ||
|
||
Notes: | ||
* NoRuntime feature. When set to true, Halide excludes its runtime from the generated code, requiring you to link the runtime manually during the linking step. Setting it to false includes the Halide runtime within the generated library, simplifying deployment. | ||
* ARMFp16. Leverages ARM’s hardware support for half-precision (16-bit) floating-point computations, significantly accelerating workloads where reduced precision is acceptable, such as neural networks and image processing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think mentioning the example of NN and IP here sounds redundant. |
||
|
||
We declare spatial variables (x, y) and an ImageParam named “input” representing the input image data. We use boundary clamping (clamp) to safely handle edge pixels. Then, we apply a 3x3 blur with a reduction domain (RDom). The accumulated sum is divided by 9 (the number of pixels in the neighborhood), producing an average blurred image. Lastly, thresholding is applied, producing a binary output: pixels above a certain brightness threshold (128) become white (255), while others become black (0). | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This section intentionally reinforces previous concepts, focusing now primarily on explicitly clarifying integration details, such as type correctness and the handling of runtime features within Halide. | ||
|
||
Simple scheduling directives (compute_root) instruct Halide to compute intermediate functions at the pipeline’s root, simplifying debugging and potentially enhancing runtime efficiency. | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This strategy can simplify debugging by clearly isolating computational steps and may enhance runtime efficiency by explicitly controlling intermediate storage locations. | ||
|
||
By clearly separating algorithm logic from scheduling, developers can easily test and compare different scheduling strategies,such as compute_inline, compute_root, compute_at, and more, without modifying their fundamental algorithmic code. This separation significantly accelerates iterative optimization and debugging processes, ultimately yielding better-performing code with minimal overhead. | ||
|
||
We invoke Halide’s AOT compilation function compile_to_static_library, which generates a static library (.a) containing the optimized pipeline and a corresponding header file (.h). | ||
|
||
```cpp | ||
thresholded.compile_to_static_library( | ||
output_basename, // base filename for output files (e.g., "blur_threshold_android") | ||
{ input }, // list of input parameters to the pipeline | ||
"blur_threshold", // the generated function name | ||
target // our target configuration for Android | ||
); | ||
``` | ||
|
||
This will produce: | ||
* A static library (blur_threshold_android.a) containing the compiled pipeline. This static library also includes Halide’s runtime functions tailored specifically for the targeted architecture (arm-64-android). Thus, no separate Halide runtime needs to be provided on the Android device when linking against this library. | ||
* A header file (blur_threshold_android.h) declaring the pipeline function for use in other C++/JNI code. | ||
dawidborycki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
These generated files are then ready to integrate directly into an Android project via JNI, allowing efficient execution of the optimized pipeline on Android devices. The integration process is covered in the next section. | ||
|
||
Note: JNI (Java Native Interface) is a framework that allows Java (or Kotlin) code running in a Java Virtual Machine (JVM), such as on Android, to interact with native applications and libraries written in languages like C or C++. JNI bridges the managed Java/Kotlin environment and the native, platform-specific implementations. | ||
|
||
## Compilation instructions | ||
To compile the pipeline-generation program on your host system, use the following commands (replace /path/to/halide with your Halide installation directory): | ||
```console | ||
export DYLD_LIBRARY_PATH=/path/to/halide/lib/libHalide.19.dylib | ||
g++ -std=c++17 blud-android.cpp -o blud-android \ | ||
-I/path/to/halide/include -L/path/to/halide/lib -lHalide \ | ||
$(pkg-config --cflags --libs opencv4) -lpthread -ldl \ | ||
-Wl,-rpath,/path/to/halide/lib | ||
``` | ||
|
||
Then execute the binary: | ||
```console | ||
./blur_android blur_threshold_android | ||
``` | ||
|
||
This will produce two files: | ||
* blur_threshold_android.a: The static library containing your Halide pipeline. | ||
* blur_threshold_android.h: The header file needed to invoke the generated pipeline. | ||
|
||
We will integrate these files into our Android project in the following section. | ||
|
||
## Summary | ||
In this section, we’ve explored Halide’s powerful ahead-of-time (AOT) and cross-compilation capabilities, preparing an optimized image processing pipeline tailored specifically for Android devices. By using the host-based Halide compiler, we’ve generated a static library optimized for ARM64 Android architecture, incorporating safe boundary conditions, neighborhood-based blurring, and thresholding operations. This streamlined process allows seamless integration of highly optimized native code into Android applications, ensuring both development efficiency and runtime performance on mobile platforms. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.