Skip to content

Commit c7fa838

Browse files
authored
Jingxu10/111 tutorials (#763)
* update arch diagram * add color to div.version a * amended tutorials for 1.11.100 * fine tune docs * add docker container to docs/tutorial/installation.md
1 parent 3756c84 commit c7fa838

File tree

10 files changed

+293
-29
lines changed

10 files changed

+293
-29
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ python -m pip install intel_extension_for_pytorch -f https://software.intel.com/
2020

2121
**Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
2222

23-
More installation methods can be found at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/tutorials/installation.html)
23+
More installation methods can be found at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/1.11.0/tutorials/installation.html)
2424

2525
## Getting Started
2626

2727
Minor code changes are required for users to get start with Intel® Extension for PyTorch\*. Both PyTorch imperative mode and TorchScript mode are supported. You just need to import Intel® Extension for PyTorch\* package and apply its optimize function against the model object. If it is a training workload, the optimize function also needs to be applied against the optimizer object.
2828

29-
The following code snippet shows an inference code with FP32 data type. More examples, including training and C++ examples, are available at [Example page](https://intel.github.io/intel-extension-for-pytorch/tutorials/examples.html).
29+
The following code snippet shows an inference code with FP32 data type. More examples, including training and C++ examples, are available at [Example page](https://intel.github.io/intel-extension-for-pytorch/1.11.0/tutorials/examples.html).
3030

3131
```python
3232
import torch
@@ -47,7 +47,7 @@ with torch.no_grad():
4747

4848
## Model Zoo
4949

50-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
50+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.11-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.11-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
5151

5252
## License
5353

docs/_static/custom.css

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
/* make the page 1000px */
2+
div.version a {
3+
color: hsla(0,0%,100%,.3);
4+
}
5+
26
.wy-nav-content {
37
max-width: 1200px;
48
}

docs/index.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,11 @@ Welcome to Intel® Extension for PyTorch* documentation!
88

99
Intel® Extension for PyTorch* extends PyTorch with optimizations for extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).
1010

11-
Intel® Extension for PyTorch* is structured as the following figure. It is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.
11+
Intel® Extension for PyTorch* is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`.
12+
13+
Comparing to eager mode, graph mode in PyTorch normally yields better performance from optimization methodologies like operator fusion. Intel® Extension for PyTorch* provides further optimizations in graph mode. It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with `TorchScript <https://pytorch.org/docs/stable/jit.html>`_. Users may wish to run with `torch.jit.trace()` function first, since it works with Intel® Extension for PyTorch* better than `torch.jit.script()` function in general. More detailed information can be found at `pytorch.org website <https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html#tracing-modules>`_.
14+
15+
It is structured as the following figure. PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Intel® Extension for PyTorch* covers optimizations for both eager mode and graph mode. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boost is available by converting the eager-mode model into the graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
1216

1317
.. image:: ../images/intel_extension_for_pytorch_structure.png
1418
:width: 800

docs/tutorials/examples.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,8 @@ with torch.no_grad():
244244

245245
#### TorchScript Mode
246246

247+
It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
248+
247249
##### Resnet50
248250

249251
```
@@ -350,6 +352,8 @@ with torch.no_grad():
350352

351353
#### TorchScript Mode
352354

355+
It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
356+
353357
##### Resnet50
354358

355359
```

docs/tutorials/features.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ Graph Optimization
5757

5858
To optimize performance further with torchscript, Intel® Extension for PyTorch\* supports fusion of frequently used operator patterns, like Conv2D+ReLU, Linear+ReLU, etc. The benefit of the fusions are delivered to users in a transparant fashion.
5959

60+
Comparing to eager mode, graph mode in PyTorch normally yields better performance from optimization methodologies like operator fusion. Intel® Extension for PyTorch* provides further optimizations in graph mode. It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with `TorchScript <https://pytorch.org/docs/stable/jit.html>`_. Users may wish to run with `torch.jit.trace()` function first, since it works with Intel® Extension for PyTorch* better than `torch.jit.script()` function in general. More detailed information can be found at `pytorch.org website <https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html#tracing-modules>`_.
61+
6062
Check more detailed information for `Graph Optimization <features/graph_optimization.html>`_.
6163

6264
.. toctree::
@@ -119,8 +121,6 @@ Intel® Extension for PyTorch* has built-in quantization recipes to deliver good
119121

120122
Check more detailed information for `INT8 <features/int8.html>`_.
121123

122-
oneDNN provides an evaluation feature called `oneDNN Graph Compiler <https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview4/doc#onednn-graph-compiler>`_. Please refer to `oneDNN build instruction <https://github.com/oneapi-src/oneDNN/blob/dev-graph-preview4/doc/build/build_options.md#build-graph-compiler>`_ to try this feature.
123-
124124
.. toctree::
125125
:hidden:
126126
:maxdepth: 1

docs/tutorials/features/amp.md

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -100,20 +100,3 @@ These ops don't require a particular dtype for stability, but take multiple inpu
100100
`cat`, `stack`, `index_copy`
101101

102102
Some ops not listed here (e.g., binary ops like `add`) natively promote inputs without autocasting's intervention. If inputs are a mixture of `bfloat16` and `float32`, these ops run in `float32` and produce `float32` output, regardless of whether autocast is enabled.
103-
104-
## Design Details
105-
106-
### Frontend API Design
107-
108-
`torch.cpu.amp` is designed to be context managers that allow scopes of your script to run in mixed precision. It takes input parameter `dtype`, which is `torch.bfloat16` by default.
109-
110-
### Dedicated Dispatch Key
111-
112-
`torch.cpu.amp` extends the design of the original pytorch `Auto Mixed Precision` using the dedicated dispatch key of `AutocastCPU`. Each tensor during creation will have an `Autocast` Dispatchkey corresponding to the device (`CUDA` or `CPU`). Thus, for every tensor on CPU, `AutocastCPU` exists along with the tensor. During the dispatch phase, operators with input tensors of `AutocastCPU` are dispatched to the `Autocast` layers. The `Autocast` layer decides what precision to chooses for each operator. `AutocastCPU` has higher dispatch priority comparing to `Autograd` which makes sure the `Autocast` layer runs before `Autograd`.
113-
114-
### Operations category
115-
116-
The operations are generally divided into 3 major categories and registered under Dispatch Key `AutocastCPU`:
117-
* `lower_precision_fp` category: Computation bound operators that could get performance boost with BFloat16 data type through acceleration by Intel CPU BFloat16 instruction set. Inputs of them are casted into `torch.bfloat16` before execution. `convolutions` and `linear` are examples of this category.
118-
* `fallthrough` category: Operators that support running with both Float32 and BFloat16 data types, but could not get performance boost with BFloat16 data type. `relu` and `max_pool2d` are examples of this category.
119-
* `fp32` category: Operators that are not enabled with BFloat16 support yet. Inputs of them are casted into `float32` before execution. `max_pool3d` and `group_norm` are examples of this category.

docs/tutorials/features/graph_optimization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
Graph Optimization
22
==================
33

4-
Most Deep Learning models could be described as DAG(directed acyclic graph). Therefore, how to optimize a deep learning model from graph perspective is a nature thinking. Compared to the operator optimization and algorithm optimization, the graph optimization is at more high level. It convers not only the graph self but also the runtime. From the operator perspective, the graph optimization contains the operator fusing, the constant folding. From the runtime perspective, the graph optimization contains the operator scheduling, the computation resources management, the memory mangement.
4+
Most Deep Learning models could be described as a directed acyclic graph (DAG). How to optimize a deep learning model from graph perspective is a natural consideration. Comparing to operator optimization and algorithm optimization, graph optimization is at a higher level. It not only convers the graph model itself but also involves runtime operations. From operator perspective, the graph optimization involves operator fusing and constant folding. From runtime perspective, the graph optimization involves operator scheduling, computation resources management, and memory mangement.
55

6-
Currently, the Intel Extension for PyTorch focuses on the operator related graph optimizations. Regarding the runtime related optimization, the extension also provides some experiment features. Please refer to the runtime extension for more details about runtime optimization.
6+
Currently, the Intel Extension for PyTorch focuses on operator-related graph optimizations. Regarding to the runtime related optimizations, the extension provides some experimental features. Please refer to the runtime extension for more details about runtime optimization.
77

88

99
## Fusion

docs/tutorials/installation.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,36 @@ git submodule update --init --recursive
8585
python setup.py install
8686
```
8787

88+
## Install via Docker container
89+
90+
### Build Docker container from Dockerfile
91+
92+
Run the following commands to build the `pip` based deployment container:
93+
94+
```console
95+
$ cd docker
96+
$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.pip -t intel-extension-for-pytorch:pip .
97+
$ docker run --rm intel-extension-for-pytorch:pip python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
98+
```
99+
100+
Run the following commands to build the `conda` based development container:
101+
102+
```console
103+
$ cd docker
104+
$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.conda -t intel-extension-for-pytorch:conda .
105+
$ docker run --rm intel-extension-for-pytorch:conda python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
106+
```
107+
108+
### Get docker container from dockerhub
109+
110+
Pre-built docker images are available at [DockerHub](https://hub.docker.com/r/intel/intel-optimized-pytorch/tags).
111+
112+
Please run the following command to pull the image to your local machine.
113+
114+
```console
115+
docker pull intel/intel-optimized-pytorch:latest
116+
```
117+
88118
## Install C++ SDK
89119

90120
|Version|Pre-cxx11 ABI|cxx11 ABI|

0 commit comments

Comments
 (0)