You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/02-how-to-run/quantize_model.md
+12-7Lines changed: 12 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,8 +8,8 @@ The fixed-point model has many advantages over the fp32 model:
8
8
- Benefit from the smaller model, the Cache hit rate is improved and inference would be faster
9
9
- Chips tend to have corresponding fixed-point acceleration instructions which are faster and less energy consumed (int8 on a common CPU requires only about 10% of energy)
10
10
11
-
The size of the installation package and the heat generation are the key indicators of the mobile terminal evaluation APP;
12
-
On the server side, quantization means that you can maintain the same QPS and improve model precision in exchange for improved accuracy.
11
+
APK file size and heat generation are key indicators while evaluating mobile APP;
12
+
On server side, quantization means that you can increase model size in exchange for precision and keep the same QPS.
13
13
14
14
## Post training quantization scheme
15
15
@@ -21,7 +21,7 @@ Taking ncnn backend as an example, the complete workflow is as follows:
21
21
22
22
mmdeploy generates quantization table based on static graph (onnx) and uses backend tools to convert fp32 model to fixed point.
23
23
24
-
Currently mmdeploy support ncnn with PTQ.
24
+
mmdeploy currently support ncnn with PTQ.
25
25
26
26
## How to convert model
27
27
@@ -38,10 +38,15 @@ Back in mmdeploy, enable quantization with the option 'tools/deploy.py --quant'.
0 commit comments