[Docs] Updates to docs-guides (#2239)

vineet-g · Vineet Garg · web-flow · commit 878c9373e88f · 2024-06-18T10:49:00.000-07:00
* Consistent usage of coremltools.optimize as cto across docs-guides

* Code formatting

---------

Co-authored-by: Vineet Garg &lt;vineetgarg@apple.com&gt;
diff --git a/docs-guides/source/mlmodel-utilities.md b/docs-guides/source/mlmodel-utilities.md
@@ -120,7 +120,7 @@ optimization of the model via the `ct.optimize.coreml` API.
 
 ### Using the Metadata 
 
-The  [`get_weights_metadata()`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.get_weights_metadata) utility returns the weights metadata as an ordered dictionary that maps to strings in [CoreMLWeightMetaData](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.CoreMLWeightMetaData) and preserves the sequential order of the weights. The results are useful when constructing [`cto.OptimizationConfig`](https://apple.github.io/coremltools/docs-guides/source/optimizecoreml-api-overview.html#customizing-ops-to-compress).
+The  [`get_weights_metadata()`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.get_weights_metadata) utility returns the weights metadata as an ordered dictionary that maps to strings in [CoreMLWeightMetaData](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.CoreMLWeightMetaData) and preserves the sequential order of the weights. The results are useful when constructing [`cto.coreml.OptimizationConfig`](https://apple.github.io/coremltools/docs-guides/source/optimizecoreml-api-overview.html#customizing-ops-to-compress).
 
 For example, with the [OptimizationConfig](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig) class you have fine-grain control over applying different optimization configurations to different weights by directly setting `op_type_configs` and `op_name_configs` or using [`set_op_name`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig.set_op_name) and [`set_op_type`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig.set_op_type). When using [`set_op_name`](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.config.html#coremltools.optimize.coreml.OptimizationConfig.set_op_name), you need to know the name for the `const` op that produces the weight. The  `get_weights_metadata()` utility provides the weight name and the corresponding weight numpy data, along with metadata information. 
 
@@ -132,7 +132,7 @@ The following code loads the `SegmentationModel_with_metadata.mlpackage` saved i
 The example also shows how to get the name of the last weight in the model. The code palettizes all ops except the last weight, which is a common practical scenario when the last layer is more sensitive and should be skipped from quantization:
 
 ```python
-import coremltools.optimize.coreml as cto
+import coremltools.optimize as cto
 
 from coremltools.models import MLModel
 from coremltools.optimize.coreml import get_weights_metadata
@@ -164,11 +164,11 @@ for weight_name, weight_metadata in weight_metadata_dict.items():
 
 # Palettize all weights except for the last weight
 last_weight_name = list(weight_metadata_dict.keys())[-1]
-global_config = cto.OpPalettizerConfig(nbits=6, mode="kmeans")
-config = cto.OptimizationConfig(
+global_config = cto.coreml.OpPalettizerConfig(nbits=6, mode="kmeans")
+config = cto.coreml.OptimizationConfig(
     global_config=global_config,
     op_name_configs={last_weight_name: None},
 )
-compressed_mlmodel = cto.palettize_weights(mlmodel, config)
+compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, config)
 
 ```
diff --git a/docs-guides/source/opt-palettization-api.md b/docs-guides/source/opt-palettization-api.md
@@ -22,19 +22,19 @@ The following example shows `6-bit` palettization applied to all the ops which h
 This is controlled by setting the `weight_threshold` parameter to 512.
 ```python
 import coremltools as ct
-import coremltools.optimize.coreml as cto
+import coremltools.optimize as cto
 
 # load model
 mlmodel = ct.models.MLModel(uncompressed_model_path)
 
 # define op config 
-op_config = cto.OpPalettizerConfig(nbits=6, weight_threshold=512)
+op_config = cto.coreml.OpPalettizerConfig(nbits=6, weight_threshold=512)
 
 # define optimization config by applying the op config globally to all ops 
-config = cto.OptimizationConfig(global_config=op_config)
+config = cto.coreml.OptimizationConfig(global_config=op_config)
 
 # palettize weights
-compressed_mlmodel = cto.palettize_weights(mlmodel, config)
+compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, config)
 ```
 Some key parameters that the config accepts are:
 - `n_bits` : This controls the number of clusters, which are `2^n_bits` .
@@ -54,18 +54,18 @@ to `8-bits`, and two of the conv ops (named `conv1` and `conv3`) are omitted fro
 
 ```python
 import coremltools as ct
-import coremltools.optimize.coreml as cto
+import coremltools.optimize as cto
 
 mlmodel = ct.models.MLModel(uncompressed_model_path)
 
-global_config = cto.OpPalettizerConfig(nbits=6)
-linear_config = cto.OpPalettizerConfig(nbits=8)
-config = cto.OptimizationConfig(
+global_config = cto.coreml.OpPalettizerConfig(nbits=6)
+linear_config = cto.coreml.OpPalettizerConfig(nbits=8)
+config = cto.coreml.OptimizationConfig(
     global_config=global_config,
     op_type_configs={"linear": linear_config},
     op_name_configs={"conv1": None, "conv3": None},
 )
-compressed_mlmodel = cto.palettize_weights(mlmodel, config)
+compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, config)
 ```
 
 For more details, please follow the detailed API page for [coremltools.optimize.coreml.palettize_weights](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.palettize_weights)
diff --git a/docs-guides/source/opt-quantization-api.md b/docs-guides/source/opt-quantization-api.md
@@ -9,12 +9,14 @@ You can linearly quantize the weights of your Core ML model by using the
 [``linear_quantize_weights``](https://apple.github.io/coremltools/source/coremltools.optimize.coreml.post_training_quantization.html#coremltools.optimize.coreml.linear_quantize_weights) method as follows:
 
 ```python
-import coremltools.optimize.coreml as cto
+import coremltools.optimize as cto
 
-op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", weight_threshold=512)
-config = cto.OptimizationConfig(global_config=op_config)
+op_config = cto.coreml.OpLinearQuantizerConfig(
+    mode="linear_symmetric", weight_threshold=512
+)
+config = cto.coreml.OptimizationConfig(global_config=op_config)
 
-compressed_8_bit_model = cto.linear_quantize_weights(model, config=config)
+compressed_8_bit_model = cto.coreml.linear_quantize_weights(model, config=config)
 ```
 
 The method defaults to ``linear_symmetric``, which uses only per-channel scales and no zero-points.  
diff --git a/docs-guides/source/opt-workflow.md b/docs-guides/source/opt-workflow.md
@@ -134,15 +134,15 @@ followed by data free palettization etc.
 Sample pseudocode of applying palettization to an `mlpackage` model:
 ```python
 import coremltools as ct
-import coremltools.optimize.coreml as cto
+import coremltools.optimize as cto
 
 mlmodel = ct.models.MLModel(uncompressed_model_path)
-op_config = cto.OpPalettizerConfig(mode="kmeans", 
+op_config = cto.coreml.OpPalettizerConfig(mode="kmeans",
                                    nbits=4, 
                                    granularity="per_grouped_channel", 
                                    group_size=16) 
-model_config = cto.OptimizationConfig(global_config=op_config)
-compressed_mlmodel = cto.palettize_weights(mlmodel, model_config)
+model_config = cto.coreml.OptimizationConfig(global_config=op_config)
+compressed_mlmodel = cto.coreml.palettize_weights(mlmodel, model_config)
 ```
 
 Sample pseudocode of applying palettization to a torch model:
@@ -191,7 +191,7 @@ Quantizing activations can be applied either to the torch model, or
 directly to an `mlpackage` model as well. Sample pseudocode snippet to do so: 
 ```python
 import coremltools as ct 
-import coremltools.optimize.coreml as cto
+import coremltools.optimize as cto
 # The following API is for coremltools==8.0b1
 # It will be moved out of "experimental" in later versions of coremltools 
 from coremltools.optimize.coreml.experimental import OpActivationLinearQuantizerConfig, \
@@ -201,16 +201,16 @@ mlmodel = ct.models.MLModel(uncompressed_model_path)
 
 # quantize activations to 8 bits (this will give an A8W16 model)
 act_quant_op_config = OpActivationLinearQuantizerConfig(mode="linear_symmetric")
-act_quant_model_config = cto.OptimizationConfig(global_config=act_quant_op_config)
+act_quant_model_config = cto.coreml.OptimizationConfig(global_config=act_quant_op_config)
 mlmodel_compressed_activations = linear_quantize_activations(mlmodel, 
                                                              act_quant_model_config,
                                                              sample_data=...)
 
 # quantize weights to 8 bits (this will give an A8W8 model)
-weight_quant_op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric",
+weight_quant_op_config = cto.coreml.OpLinearQuantizerConfig(mode="linear_symmetric",
                                                      dtype="int8")
-weight_quant_model_config = cto.OptimizationConfig(weight_quant_op_config)
-mlmodel_compressed = cto.linear_quantize_weights(mlmodel_compressed_activations,
+weight_quant_model_config = cto.coreml.OptimizationConfig(weight_quant_op_config)
+mlmodel_compressed = cto.coreml.linear_quantize_weights(mlmodel_compressed_activations,
                                                  weight_quant_model_config)
 ```