Skip to content

Commit 9993358

Browse files
trsharm25FThompsonAWSaws-bowenccaws-rishyrajaws-gouthamr
authored
Release 2.21 - re:invent (#1)
--------- Co-authored-by: Finn Thompson <[email protected]> Co-authored-by: Finn Thompson <[email protected]> Co-authored-by: Bowen Chen <[email protected]> Co-authored-by: Rishabh Rajesh <[email protected]> Co-authored-by: Goutham Ramakrishnan <[email protected]> Co-authored-by: Kailash Budhathoki <[email protected]> Co-authored-by: Yu Liu <[email protected]> Co-authored-by: zhuangw-at-533267172582 <[email protected]> Co-authored-by: Xiufeng Zhao <[email protected]> Co-authored-by: Vinesh Ravuri <[email protected]> Co-authored-by: Faqin Zhong <[email protected]> Co-authored-by: Amer <[email protected]> Co-authored-by: Pranav Ladkat <[email protected]> Co-authored-by: Akhil Raj Azhikodan <[email protected]> Co-authored-by: Joey Zheng <[email protected]> Co-authored-by: Amulya Ballakur <[email protected]> Co-authored-by: Truong Pham <[email protected]> Co-authored-by: Piyush Dugar <[email protected]> Co-authored-by: Seung Hun Chung <[email protected]> Co-authored-by: Andrew Uderian <[email protected]> Co-authored-by: Lanny Lian <[email protected]> Co-authored-by: Yishan McNabb <[email protected]> Co-authored-by: Shashwat Srijan <[email protected]> Co-authored-by: Hooman Hashemi <[email protected]> Co-authored-by: karthick gopalswamy <[email protected]> Co-authored-by: Andrew Canis <[email protected]> Co-authored-by: Beni Hegedus <[email protected]> Co-authored-by: Dylan Geva <[email protected]> Co-authored-by: Ryan Torrence <[email protected]> Co-authored-by: Yi-Hsiang (Sean) Lai <[email protected]>
1 parent d6650d0 commit 9993358

File tree

83 files changed

+13580
-8
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+13580
-8
lines changed

.gitignore

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Python .gitignore template
2+
3+
# Byte-compiled / optimized / DLL files
4+
__pycache__/
5+
*.py[cod]
6+
*$py.class
7+
8+
# C extensions
9+
*.so
10+
11+
# Distribution / packaging
12+
.Python
13+
build/
14+
develop-eggs/
15+
dist/
16+
downloads/
17+
eggs/
18+
.eggs/
19+
lib/
20+
lib64/
21+
parts/
22+
sdist/
23+
var/
24+
wheels/
25+
pip-wheel-metadata/
26+
share/python-wheels/
27+
*.egg-info/
28+
.installed.cfg
29+
*.egg
30+
MANIFEST
31+
32+
# PyInstaller
33+
# Usually these files are written by a python script from a template
34+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
35+
*.manifest
36+
*.spec
37+
38+
# Installer logs
39+
pip-log.txt
40+
pip-delete-this-directory.txt
41+
42+
# Unit test / coverage reports
43+
htmlcov/
44+
.tox/
45+
.nox/
46+
.coverage
47+
.coverage.*
48+
.cache
49+
nosetests.xml
50+
coverage.xml
51+
*.cover
52+
*.py,cover
53+
.hypothesis/
54+
.pytest_cache/
55+
56+
# Translations
57+
*.mo
58+
*.pot
59+
60+
# Django stuff:
61+
*.log
62+
local_settings.py
63+
db.sqlite3
64+
db.sqlite3-journal
65+
66+
# Flask stuff:
67+
instance/
68+
.webassets-cache
69+
70+
# Scrapy stuff:
71+
.scrapy
72+
73+
# Sphinx documentation
74+
docs/_build/
75+
76+
# PyBuilder
77+
target/
78+
79+
# Jupyter Notebook
80+
.ipynb_checkpoints
81+
82+
# IPython
83+
profile_default/
84+
ipython_config.py
85+
86+
# pyenv
87+
.python-version
88+
89+
# pipenv
90+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
91+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
92+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
93+
# install all needed dependencies.
94+
#Pipfile.lock
95+
96+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
97+
__pypackages__/
98+
99+
# Celery stuff
100+
celerybeat-schedule
101+
celerybeat.pid
102+
103+
# SageMath parsed files
104+
*.sage.py
105+
106+
# Environments
107+
.env
108+
.venv
109+
env/
110+
venv/
111+
ENV/
112+
env.bak/
113+
venv.bak/
114+
115+
# Spyder project settings
116+
.spyderproject
117+
.spyproject
118+
119+
# Rope project settings
120+
.ropeproject
121+
122+
# mkdocs documentation
123+
/site
124+
125+
# mypy
126+
.mypy_cache/
127+
.dmypy.json
128+
dmypy.json
129+
130+
# Pyre type checker
131+
.pyre/
132+
133+
# NxD
134+
135+
build
136+
.vscode/
137+
*.iml
138+
.attach_pid*
139+
src/neuronx_distributed.egg-info/
140+
*.whl
141+
**/.DS_Store
142+
__pycache__

.pre-commit-config.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
default_language_version:
2+
# force all unspecified python hooks to run python3
3+
python: python3
4+
repos:
5+
- repo: https://github.com/pre-commit/pre-commit-hooks
6+
rev: v2.3.0
7+
hooks:
8+
- id: end-of-file-fixer
9+
- id: trailing-whitespace
10+
- id: detect-aws-credentials
11+
- repo: https://github.com/pocc/pre-commit-hooks
12+
rev: v1.1.1
13+
hooks:
14+
- id: clang-format
15+
args: [--style=file, -i]
16+
- repo: https://github.com/astral-sh/ruff-pre-commit
17+
rev: v0.5.0
18+
hooks:
19+
- id: ruff
20+
name: ruff
21+
entry: ruff
22+
args: [check, --fix, "--line-length=120", "--ignore=F401,E203"]
23+
types: [python]
24+
language: system
25+
exclude: cases_update

README.md

Lines changed: 128 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,137 @@
11
## My Project
22

3-
TODO: Fill this README out!
3+
This package provides a model hub for running inference on Neuronx Distributed (NxD).
44

5-
Be sure to:
5+
## Examples
6+
This package includes examples that you can reference when you implement code that uses NxD Inference.
7+
* `generation_demo.py` - A basic generation example for Llama.
68

7-
* Change the title in this README
8-
* Edit your repository description on GitHub
9+
## Run inference with the inference demo
10+
This package includes an inference demo console script that you can use to run inference. This script includes benchmarking and accuracy checking features that are useful for developers to verify that their models and modules work correctly.
911

10-
## Security
12+
After you install this package, you can run the inference demo with `inference-demo`. See examples below for how to run the inference demo. You can also run `inference_demo --help` to view all available arguments.
1113

12-
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
14+
### Example 1: Llama inference with token matching accuracy check
15+
```
16+
inference_demo \
17+
--model-type llama \
18+
--task-type causal-lm \
19+
run \
20+
--model-path /home/ubuntu/model_hf/Llama-3.1-8B-Instruct/ \
21+
--compiled-model-path /home/ubuntu/traced_model/Llama-3.1-8B-Instruct/ \
22+
--torch-dtype bfloat16 \
23+
--tp-degree 32 \
24+
--batch-size 2 \
25+
--max-context-length 32 \
26+
--seq-len 64 \
27+
--on-device-sampling \
28+
--enable-bucketing \
29+
--top-k 1 \
30+
--do-sample \
31+
--pad-token-id 2 \
32+
--prompt "I believe the meaning of life is" \
33+
--prompt "The color of the sky is" \
34+
--check-accuracy-mode token-matching \
35+
--benchmark
36+
```
1337

14-
## License
38+
### Example 2. DBRX inference with logit matching accuracy check
1539

16-
This project is licensed under the Apache-2.0 License.
40+
```
41+
inference_demo \
42+
--model-type dbrx \
43+
--task-type causal-lm \
44+
run \
45+
--model-path /home/ubuntu/model_hf/dbrx-1layer/ \
46+
--compiled-model-path /home/ubuntu/traced_model/dbrx-1layer-demo/ \
47+
--torch-dtype bfloat16 \
48+
--tp-degree 32 \
49+
--batch-size 2 \
50+
--max-context-length 1024 \
51+
--seq-len 1152 \
52+
--enable-bucketing \
53+
--top-k 1 \
54+
--do-sample \
55+
--pad-token-id 0 \
56+
--prompt "I believe the meaning of life is" \
57+
--prompt "The color of the sky is" \
58+
--check-accuracy-mode logit-matching
59+
```
1760

61+
### Example 3. Llama with speculation
62+
63+
```
64+
inference_demo \
65+
--model-type llama \
66+
--task-type causal-lm \
67+
run \
68+
--model-path /home/ubuntu/model_hf/open_llama_7b/ \
69+
--compiled-model-path /home/ubuntu/traced_model/open_llama_7b/ \
70+
--draft-model-path /home/ubuntu/model_hf/open_llama_3b/ \
71+
--compiled-draft-model-path /home/ubuntu/traced_model/open_llama_3b/ \
72+
--torch-dtype bfloat16 \
73+
--tp-degree 32 \
74+
--batch-size 1 \
75+
--max-context-length 32 \
76+
--seq-len 64 \
77+
--enable-bucketing \
78+
--speculation-length 5 \
79+
--no-trace-tokengen-model \
80+
--top-k 1 \
81+
--do-sample \
82+
--pad-token-id 2 \
83+
--prompt "I believe the meaning of life is" \
84+
--check-accuracy-mode token-matching \
85+
--benchmark
86+
```
87+
88+
### Example 4. Llama with quantization
89+
90+
```
91+
inference_demo \
92+
--model-type llama \
93+
--task-type causal-lm \
94+
run \
95+
--model-path /home/ubuntu/model_hf/Llama-2-7b/ \
96+
--compiled-model-path /home/ubuntu/traced_model/Llama-2-7b-demo/ \
97+
--torch-dtype bfloat16 \
98+
--tp-degree 32 \
99+
--batch-size 2 \
100+
--max-context-length 32 \
101+
--seq-len 64 \
102+
--on-device-sampling \
103+
--enable-bucketing \
104+
--quantized \
105+
--quantized-checkpoints-path /home/ubuntu/model_hf/Llama-2-7b/model_quant.pt \
106+
--quantization-type per_channel_symmetric \
107+
--top-k 1 \
108+
--do-sample \
109+
--pad-token-id 2 \
110+
--prompt "I believe the meaning of life is" \
111+
--prompt "The color of the sky is"
112+
```
113+
114+
### Example 5. Llama inference with logit matching accuracy check using custom error tolerances
115+
116+
```
117+
inference_demo \
118+
--model-type llama \
119+
--task-type causal-lm \
120+
run \
121+
--model-path /home/ubuntu/model_hf/Llama-2-7b/ \
122+
--compiled-model-path /home/ubuntu/traced_model/Llama-2-7b-demo/ \
123+
--torch-dtype bfloat16 \
124+
--tp-degree 32 \
125+
--batch-size 2 \
126+
--max-context-length 32 \
127+
--seq-len 64 \
128+
--check-accuracy-mode logit-matching \
129+
--divergence-difference-tol 0.005 \
130+
--tol-map "{5: (1e-5, 0.02)}" \
131+
--enable-bucketing \
132+
--top-k 1 \
133+
--do-sample \
134+
--pad-token-id 2 \
135+
--prompt "I believe the meaning of life is" \
136+
--prompt "The color of the sky is"
137+
```

SECURITY.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Reporting Security Issues
2+
3+
We take all security reports seriously.
4+
When we receive such reports,
5+
we will investigate and subsequently address
6+
any potential vulnerabilities as quickly as possible.
7+
If you discover a potential security issue in this project,
8+
please notify AWS/Amazon Security via our
9+
[vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/)
10+
or directly via email to [AWS Security](mailto:[email protected]).
11+
Please do *not* create a public GitHub issue in this project.

build.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#! /bin/bash
2+
set -e
3+
4+
: ${BUILD_PATH:=build}
5+
6+
python3.8 -m pip install ruff
7+
# remove --exit-zero once all errors are fixed/explicitly ignore
8+
python3.8 -m ruff check --line-length=120 --ignore=F401,E203
9+
# exit when asked to run `ruff` only
10+
if [[ "$1" == "ruff" ]]
11+
then
12+
exit 0
13+
fi
14+
15+
# Run static code analysis
16+
python3.8 -m pip install mypy
17+
python3.8 -m mypy --no-incremental || true
18+
# exit when asked to run `mypy` only
19+
if [[ "$1" == "mypy" ]]
20+
then
21+
exit 0
22+
fi
23+
24+
python3.8 setup.py bdist_wheel --dist-dir ${BUILD_PATH}/pip/public/neuronx-distributed-inference

0 commit comments

Comments
 (0)