Jingxu10/111 tutorials (#763)

jingxu10 · web-flow · commit c7fa83836888 · 2022-05-17T06:24:00.000+09:00
* update arch diagram

* add color to div.version a

* amended tutorials for 1.11.100

* fine tune docs

* add docker container to docs/tutorial/installation.md
diff --git a/README.md b/README.md
@@ -20,13 +20,13 @@ python -m pip install intel_extension_for_pytorch -f https://software.intel.com/
 
 **Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
 
-More installation methods can be found at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/tutorials/installation.html)
+More installation methods can be found at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/1.11.0/tutorials/installation.html)
 
 ## Getting Started
 
 Minor code changes are required for users to get start with Intel® Extension for PyTorch\*. Both PyTorch imperative mode and TorchScript mode are supported. You just need to import Intel® Extension for PyTorch\* package and apply its optimize function against the model object. If it is a training workload, the optimize function also needs to be applied against the optimizer object.
 
-The following code snippet shows an inference code with FP32 data type. More examples, including training and C++ examples, are available at [Example page](https://intel.github.io/intel-extension-for-pytorch/tutorials/examples.html).
+The following code snippet shows an inference code with FP32 data type. More examples, including training and C++ examples, are available at [Example page](https://intel.github.io/intel-extension-for-pytorch/1.11.0/tutorials/examples.html).
 
 ```python
 import torch
@@ -47,7 +47,7 @@ with torch.no_grad():
 
 ## Model Zoo
 
-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
+Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.11-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.11-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
 
 ## License
 
diff --git a/docs/_static/custom.css b/docs/_static/custom.css
@@ -1,4 +1,8 @@
 /* make the page 1000px */
+div.version a {
+   color: hsla(0,0%,100%,.3);
+}
+
 .wy-nav-content {
    max-width: 1200px;
 }
diff --git a/docs/index.rst b/docs/index.rst
@@ -8,7 +8,11 @@ Welcome to Intel® Extension for PyTorch* documentation!
 
 Intel® Extension for PyTorch* extends PyTorch with optimizations for extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).
 
-Intel® Extension for PyTorch* is structured as the following figure. It is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.
+Intel® Extension for PyTorch* is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`.
+
+Comparing to eager mode, graph mode in PyTorch normally yields better performance from optimization methodologies like operator fusion. Intel® Extension for PyTorch* provides further optimizations in graph mode. It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with `TorchScript <https://pytorch.org/docs/stable/jit.html>`_. Users may wish to run with `torch.jit.trace()` function first, since it works with Intel® Extension for PyTorch* better than `torch.jit.script()` function in general. More detailed information can be found at `pytorch.org website <https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html#tracing-modules>`_.
+
+It is structured as the following figure. PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Intel® Extension for PyTorch* covers optimizations for both eager mode and graph mode. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boost is available by converting the eager-mode model into the graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
 
 .. image:: ../images/intel_extension_for_pytorch_structure.png
   :width: 800
diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md
@@ -244,6 +244,8 @@ with torch.no_grad():
 
 #### TorchScript Mode
 
+It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
+
 ##### Resnet50
 
 ```
@@ -350,6 +352,8 @@ with torch.no_grad():
 
 #### TorchScript Mode
 
+It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
+
 ##### Resnet50
 
 ```
diff --git a/docs/tutorials/features.rst b/docs/tutorials/features.rst
@@ -57,6 +57,8 @@ Graph Optimization
 
 To optimize performance further with torchscript, Intel® Extension for PyTorch\* supports fusion of frequently used operator patterns, like Conv2D+ReLU, Linear+ReLU, etc.  The benefit of the fusions are delivered to users in a transparant fashion.
 
+Comparing to eager mode, graph mode in PyTorch normally yields better performance from optimization methodologies like operator fusion. Intel® Extension for PyTorch* provides further optimizations in graph mode. It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with `TorchScript <https://pytorch.org/docs/stable/jit.html>`_. Users may wish to run with `torch.jit.trace()` function first, since it works with Intel® Extension for PyTorch* better than `torch.jit.script()` function in general. More detailed information can be found at `pytorch.org website <https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html#tracing-modules>`_.
+
 Check more detailed information for `Graph Optimization <features/graph_optimization.html>`_.
 
 .. toctree::
@@ -119,8 +121,6 @@ Intel® Extension for PyTorch* has built-in quantization recipes to deliver good
 
 Check more detailed information for `INT8 <features/int8.html>`_.
 
-oneDNN provides an evaluation feature called `oneDNN Graph Compiler <https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview4/doc#onednn-graph-compiler>`_. Please refer to `oneDNN build instruction <https://github.com/oneapi-src/oneDNN/blob/dev-graph-preview4/doc/build/build_options.md#build-graph-compiler>`_ to try this feature.
-
 .. toctree::
    :hidden:
    :maxdepth: 1
diff --git a/docs/tutorials/features/amp.md b/docs/tutorials/features/amp.md
@@ -100,20 +100,3 @@ These ops don't require a particular dtype for stability, but take multiple inpu
 `cat`, `stack`, `index_copy`
 
 Some ops not listed here (e.g., binary ops like `add`) natively promote inputs without autocasting's intervention.  If inputs are a mixture of `bfloat16` and `float32`, these ops run in `float32` and produce `float32` output, regardless of whether autocast is enabled.
-
-## Design Details
-
-### Frontend API Design
-
-`torch.cpu.amp` is designed to be context managers that allow scopes of your script to run in mixed precision. It takes input parameter `dtype`, which is `torch.bfloat16` by default.
-
-### Dedicated Dispatch Key
-
-`torch.cpu.amp` extends the design of the original pytorch `Auto Mixed Precision` using the dedicated dispatch key of `AutocastCPU`. Each tensor during creation will have an `Autocast` Dispatchkey corresponding to the device (`CUDA` or `CPU`). Thus, for every tensor on CPU, `AutocastCPU` exists along with the tensor. During the dispatch phase, operators with input tensors of `AutocastCPU` are dispatched to the `Autocast` layers. The `Autocast` layer decides what precision to chooses for each operator. `AutocastCPU` has higher dispatch priority comparing to `Autograd` which makes sure the `Autocast` layer runs before `Autograd`.
-
-### Operations category
-
-The operations are generally divided into 3 major categories and registered under Dispatch Key `AutocastCPU`:
-* `lower_precision_fp` category: Computation bound operators that could get performance boost with BFloat16 data type through acceleration by Intel CPU BFloat16 instruction set. Inputs of them are casted into `torch.bfloat16` before execution. `convolutions` and `linear` are examples of this category.
-* `fallthrough` category: Operators that support running with both Float32 and BFloat16 data types, but could not get performance boost with BFloat16 data type. `relu` and `max_pool2d` are examples of this category.
-* `fp32` category: Operators that are not enabled with BFloat16 support yet. Inputs of them are casted into `float32` before execution. `max_pool3d` and `group_norm` are examples of this category.
diff --git a/docs/tutorials/features/graph_optimization.md b/docs/tutorials/features/graph_optimization.md
@@ -1,9 +1,9 @@
 Graph Optimization
 ==================
 
-Most Deep Learning models could be described as DAG(directed acyclic graph). Therefore, how to optimize a deep learning model from graph perspective is a nature thinking. Compared to the operator optimization and algorithm optimization, the graph optimization is at more high level. It convers not only the graph self but also the runtime. From the operator perspective, the graph optimization contains the operator fusing, the constant folding. From the runtime perspective, the graph optimization contains the operator scheduling, the computation resources management, the memory mangement.
+Most Deep Learning models could be described as a directed acyclic graph (DAG). How to optimize a deep learning model from graph perspective is a natural consideration. Comparing to operator optimization and algorithm optimization, graph optimization is at a higher level. It not only convers the graph model itself but also involves runtime operations. From operator perspective, the graph optimization involves operator fusing and constant folding. From runtime perspective, the graph optimization involves operator scheduling, computation resources management, and memory mangement.
 
-Currently, the Intel Extension for PyTorch focuses on the operator related graph optimizations. Regarding the runtime related optimization, the extension also provides some experiment features. Please refer to the runtime extension for more details about runtime optimization.
+Currently, the Intel Extension for PyTorch focuses on operator-related graph optimizations. Regarding to the runtime related optimizations, the extension provides some experimental features. Please refer to the runtime extension for more details about runtime optimization.
 
 
 ## Fusion
diff --git a/docs/tutorials/installation.md b/docs/tutorials/installation.md
@@ -85,6 +85,36 @@ git submodule update --init --recursive
 python setup.py install
 ```
 
+## Install via Docker container
+
+### Build Docker container from Dockerfile
+
+Run the following commands to build the `pip` based deployment container:
+
+```console
+$ cd docker
+$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.pip -t intel-extension-for-pytorch:pip .
+$ docker run --rm intel-extension-for-pytorch:pip python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
+```
+
+Run the following commands to build the `conda` based development container:
+
+```console
+$ cd docker
+$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.conda -t intel-extension-for-pytorch:conda .
+$ docker run --rm intel-extension-for-pytorch:conda python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
+```
+
+### Get docker container from dockerhub
+
+Pre-built docker images are available at [DockerHub](https://hub.docker.com/r/intel/intel-optimized-pytorch/tags).
+
+Please run the following command to pull the image to your local machine.
+
+```console
+docker pull intel/intel-optimized-pytorch:latest
+```
+
 ## Install C++ SDK
 
 |Version|Pre-cxx11 ABI|cxx11 ABI|
diff --git a/docs/tutorials/performance.md b/docs/tutorials/performance.md
diff --git a/images/intel_extension_for_pytorch_structure.png b/images/intel_extension_for_pytorch_structure.png

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,8 @@`
`1`	`1`	`/* make the page 1000px */`
	`2`	`+div.version a {`
	`3`	`+ color: hsla(0,0%,100%,.3);`
	`4`	`+}`
	`5`	`+`
`2`	`6`	`.wy-nav-content {`
`3`	`7`	`max-width: 1200px;`
`4`	`8`	`}`