-
Notifications
You must be signed in to change notification settings - Fork 223
LPs on Halide #1814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
LPs on Halide #1814
Conversation
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put my 1st round of feedback, only for intro, atm.
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Show resolved
Hide resolved
## Objective | ||
In this lesson, we’ll learn how to integrate a high-performance Halide image-processing pipeline into an Android application using Kotlin. | ||
|
||
## Overview of Mobile Integration with Halide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge two section "Overview of Mobile Integration with Halide" and "Benefits of Using Halide on Mobile" into one, focusing on how Halide helps some of challenges in image processing on mobile device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaving it up to Copy Editors
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
Show resolved
Hide resolved
- Introduction, Background, and Installation. | ||
- Building a Simple Camera/Image Processing Workflow. | ||
- Demonstrating Operation Fusion. | ||
- Integrating Halide into an Android (Kotlin) Project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make the punctuation consistent between list items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note some of the additional comments are in reply of Resolved comment.
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note some comments are in resolved comment
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Outdated
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Show resolved
Hide resolved
content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I will raise a separate review comment for fusion.md
Halide::Func blur("blur"); | ||
Halide::Expr val = Halide::cast<int32_t>( | ||
input(clampCoord(x + r.x - 1, width), | ||
clampCoord(y + r.y - 1, height)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this indentation is not intentional?
{1, 2, 1} | ||
}; | ||
Halide::Buffer<int> kernelBuf(&kernel_vals[0][0], 3, 3); | ||
``` | ||
|
||
Reason for choosing this kernel: | ||
* It provides effective smoothing by considering the immediate neighbors of each pixel, making it computationally lightweight yet visually effective. | ||
* The weights approximate a Gaussian distribution, helping to maintain image details while reducing noise and small variations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precisely speaking, this filter is called Binomial filtering, which offer a computationally efficient approximation of Gaussian filtering
break; | ||
} | ||
|
||
// Convert to grayscale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this should be done as part of Halide pipeline because OpenCV generates intermediates which causes R/W memory access, while in Halide this would be a good example to show the benefit of tiling.
|
||
Halide::RDom r(0, 3, 0, 3); | ||
Halide::Func blur("blur"); | ||
Halide::Expr val = Halide::cast<int32_t>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int16_t should be the better type for accumulate reduction in this example as it won't overflow and should be memory efficient.
Halide::Var x("x"), y("y"); | ||
|
||
// Kernel layout: [1 2 1; 2 4 2; 1 2 1], sum = 16. | ||
int kernel_vals[3][3] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int16_t would be better because of the reason written in below comment.
// Gaussian blur kernel weights: center pixel has weight 4, | ||
// edge neighbors (up, down, left, right) have weight 2, | ||
// and diagonal neighbors have weight 1. | ||
Halide::Expr weight = Halide::select( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this code block is here as the alternative way is explained later.
blur.compute_at(thresholded, x_outer); | ||
``` | ||
|
||
In this scheduling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to repeat my previous comment..
This is not the good example to show tiling because the consumer of the tile is simple pixel-wise process which fusing makes sense more.
Tiling would be beneficial if we store intermediates of RGB-to-Gray stage in tile and then apply blur for each tile.
This is because, if we fuse RGB-to-Gray with blur, all 9 neighbors used in blur computes RGB-to-Gray, which ends up with much redundant computation as a entire frame.
|
||
Notes: | ||
* NoRuntime feature. When set to true, Halide excludes its runtime from the generated code, requiring you to link the runtime manually during the linking step. Setting it to false includes the Halide runtime within the generated library, simplifying deployment. | ||
* ARMFp16. Leverages ARM’s hardware support for half-precision (16-bit) floating-point computations, significantly accelerating workloads where reduced precision is acceptable, such as neural networks and image processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mentioning the example of NN and IP here sounds redundant.
0.114f * inputBuffer(x, y, 2) | ||
); | ||
|
||
// Continue pipeline: Gaussian blur (example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write consistent code for the same thing throughout this LP as long as it makes sense.
I mean to say here specifically is that the halide code below is slightly different from what we presented in the previous chapter.
|
||
// Convert RGB to grayscale directly in Halide pipeline | ||
Halide::Func grayscale("grayscale"); | ||
grayscale(x, y) = Halide::cast<uint8_t>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some comments to explain what these constants are.
To profile a pipeline you can use built-in profiler. For details on how to enable and interpret Halide’s profiler, please refer to the official [Halide profiling tutorial](https://halide-lang.org/tutorials/tutorial_lesson_21_auto_scheduler_generate.html#profiling). | ||
|
||
## Summary | ||
In this lesson, we learned about operation fusion in Halide, a powerful technique to reduce memory bandwidth and improve computational efficiency. We explored why fusion matters, identified scenarios where fusion is most effective, and demonstrated how Halide’s scheduling constructs (compute_at, store_at, fuse) enable you to apply fusion easily and effectively. By fusing the Gaussian blur and thresholding stages, we improved the performance of our real-time image processing pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can conclude that we made performance improvement without comparing any performance numbers
expensive_intermediate.compute_root(); | ||
``` | ||
|
||
This prevents redundant recomputation, resulting in higher efficiency compared to aggressively fusing these stages. In short, fusion is particularly effective in pipelines where intermediate results are not heavily reused or where recomputation costs are minimal compared to memory overhead. Being aware of these considerations helps achieve optimal scheduling decisions tailored to your specific pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately for this LP, (but fortunately for users), this pipeline is too simple to showcase that. Halide compiler simplifies this into something like:
final_stage(x, y) = expensive_intermediate(x, y) * 3 + 1
You may check the output after Halide IR transformation by :
final_stage.compile_to_lowered_stmt(...)
Therefore, we need to show different example. I think if the consumer of expensive_intermediates is a spatial filter (e.g. blur), it will work.
@@ -0,0 +1,246 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest this chapter need overhaul as follows.
- Remove anything about Loop-fusion
Reason:
- It is not in initial ideas about the agenda which this LP is going to cover.
- I suppose Loop-fusion is less commonly used in Halide scheduling. As the target reader of this LP is beginner, it would be better not to touch.
- Some of the argument in this draft does not look right. Loop-fusion is beneficial where the number of dimension is limited such as GPU target where we are restricted by GPU programming model, or parallelizing across CPU cores. In other cases, I suppose the benefit is tiny or limited and prone to the loop optimization by backend compiler (e.g. LLVM).
- Focus on operator fusion
- First, understanding the concept of operator fusion is essential when scheduling Halide.
This can be explained by comparing to scheduling withcompute_root()
. Pseudo code of loop structure would help readers to understand (please refer to offical tutorial lesson_08_scheduling_2). Note that operator fusion is the default Halide scheduling. (you could explicitly setcompute_inline()
, although we don't do that usually) - Second, we can explain the benefit of operator fusion by mentioning R/W memory access of intermediates which otherwise happens.
- Third, we can explain the case where
compute_root()
makes sense. For example, producer is expensive and the consumer has a large spatial filter with many access to producer to compute single output. - Finally, we can provide some guide about the practice of scheduling a long pipeline with many operators. I recommend to start with a naive scheduling where all
Func
is scheduled withcompute_root
, and then fusing the intermediate one-by-one where fusion simplifies (e.g. pixel-wise process), or apply tiling for small filter.
It is worth mentioning again that the default scheduling of Halide is fusion/inlining. i.e. if you don't set any scheduling toFunc
, it is fused/inlined.
The pitfall of scheduling a long pipeline is that if everything is inlined, it could end up with very complex generated code with huge compute redundancy, which both compilation time and runtime are extremely long.
Before submitting a pull request for a new Learning Path, please review Create a Learning Path
Please do not include any confidential information in your contribution. This includes confidential microarchitecture details and unannounced product information.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the Creative Commons Attribution 4.0 International License.