Skip to content

Commit 1043931

Browse files
authored
Merge pull request #1248 from aws-neuron/pull_request_2.26.1
2 parents 1db4d74 + 716f08d commit 1043931

File tree

35 files changed

+654
-713
lines changed

35 files changed

+654
-713
lines changed

about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-6.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ For more details on checkpointing, refer the `documentation <https://pytorch.org
179179

180180

181181
Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API
182-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
182+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
183183
While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue <https://github.com/huggingface/transformers/issues/2.678>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.
184184

185185

about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-7.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ For more details on checkpointing, refer the `documentation <https://pytorch.org
163163

164164

165165
Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API
166-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
166+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
167167
While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue <https://github.com/huggingface/transformers/issues/27778>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.
168168

169169

about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-8.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ For more details on checkpointing, refer the `documentation <https://pytorch.org
169169

170170

171171
Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API
172-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
172+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
173173
While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue <https://github.com/huggingface/transformers/issues/27778>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.
174174

175175
``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile

about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-x.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,8 @@ For more details on checkpointing, refer the `documentation <https://pytorch.org
183183

184184

185185
Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API
186-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
186+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
187+
187188
While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue <https://github.com/huggingface/transformers/issues/27578>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.
188189

189190
``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023

about-neuron/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,9 +141,9 @@ If you want to request a feature or report a critical issue, you can contact us
141141
:hidden:
142142

143143
What is AWS Neuron? <what-is-neuron>
144-
Architecture <arch/index>
145144
Benchmarks </about-neuron/benchmarks/index>
146145
App notes <appnotes/index>
147146
Troubleshooting <troubleshooting>
147+
SDK Maintenance Policy <sdk-policy>
148148
Security <security>
149149
Neuron FAQ <faq>

about-neuron/quick-start/torch-neuron-tab-training.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
3434
3535
# Install OS headers
36-
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
36+
sudo dnf install -y "kernel-devel-uname-r = $(uname -r)"
3737
3838
# Remove preinstalled packages and Install Neuron Driver and Runtime
3939
sudo dnf remove aws-neuron-dkms -y

conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ def get_env_vars():
214214

215215
# top_banner_message="<span>&#9888;</span><a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/setup-troubleshooting.html#gpg-key-update'> Neuron repository GPG key for Ubuntu installation has expired, see instructions how to update! </a>"
216216

217-
top_banner_message = "Neuron 2.26.0 is released! Check <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html'>What's New </a> and <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/index.html'>Announcements</a> for more details."
217+
top_banner_message = "Neuron 2.26.1 is released! Check <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html'>What's New </a> and <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/index.html'>Announcements</a> for more details."
218218

219219
html_theme = "sphinx_book_theme"
220220
html_theme_options = {

containers/getting-started.txt

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
sudo dnf update -y
3434

3535
# Install OS headers
36-
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
36+
sudo dnf install -y "kernel-devel-uname-r = $(uname -r)"
3737

3838
# Remove preinstalled packages and Install Neuron Driver and Runtime
3939
sudo dnf remove aws-neuron-dkms -y
@@ -139,7 +139,7 @@
139139
:class-title: sphinx-design-class-title-small
140140
:class-body: sphinx-design-class-body-small
141141
:animate: fade-in
142-
142+
143143
.. include:: /setup/install-templates/launch-inf1.txt
144144

145145
.. dropdown:: Install Drivers
@@ -168,7 +168,7 @@
168168
################################################################################################################
169169

170170
# Install OS headers
171-
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
171+
sudo dnf install -y "kernel-devel-uname-r = $(uname -r)"
172172

173173
# Install Neuron Driver
174174
sudo dnf install aws-neuron-dkms -y
@@ -246,12 +246,12 @@
246246

247247
::
248248

249-
+--------------+---------+--------+-----------+-----------+------+------+
250-
| PCI BDF | LOGICAL | NEURON | MEMORY | MEMORY | EAST | WEST |
251-
| | ID | CORES | CHANNEL 0 | CHANNEL 1 | | |
252-
+--------------+---------+--------+-----------+-----------+------+------+
253-
| 0000:00:1f.0 | 0 | 4 | 4096 MB | 4096 MB | 0 | 0 |
254-
+--------------+---------+--------+-----------+-----------+------+------+
249+
+--------------+---------+--------+-----------+-----------+------+------+
250+
| PCI BDF | LOGICAL | NEURON | MEMORY | MEMORY | EAST | WEST |
251+
| | ID | CORES | CHANNEL 0 | CHANNEL 1 | | |
252+
+--------------+---------+--------+-----------+-----------+------+------+
253+
| 0000:00:1f.0 | 0 | 4 | 4096 MB | 4096 MB | 0 | 0 |
254+
+--------------+---------+--------+-----------+-----------+------+------+
255255

256256
.. dropdown:: Run Tutorial
257257
:class-title: sphinx-design-class-title-small
@@ -260,4 +260,4 @@
260260

261261
:ref:`tutorial-infer`
262262
:ref:`quickstart_vllm_dlc_deploy`
263-
263+

containers/tutorials/tutorial-docker-env-setup.rst

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ to containers.
3838
3939
4040
# Install OS headers
41-
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
41+
sudo dnf install -y "kernel-devel-uname-r = $(uname -r)"
4242
4343
# Remove preinstalled packages and Install Neuron Driver and Runtime
4444
sudo dnf remove aws-neuron-dkms -y
@@ -138,34 +138,34 @@ to containers.
138138
139139
.. code:: bash
140140
141-
# Configure Linux for Neuron repository updates
142-
sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
143-
[neuron]
144-
name=Neuron YUM Repository
145-
baseurl=https://yum.repos.neuron.amazonaws.com
146-
enabled=1
147-
metadata_expire=0
148-
EOF
149-
sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
150-
151-
# Update OS packages
152-
sudo dnf update -y
153-
154-
################################################################################################################
155-
# To install or update to Neuron versions 1.19.1 and newer from previous releases:
156-
# - DO NOT skip 'aws-neuron-dkms' install or upgrade step, you MUST install or upgrade to latest Neuron driver
157-
################################################################################################################
158-
159-
# Install OS headers
160-
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
161-
162-
# Install Neuron Driver
163-
sudo dnf install aws-neuron-dkms -y
164-
165-
####################################################################################
166-
# Warning: If Linux kernel is updated as a result of OS package update
167-
# Neuron driver (aws-neuron-dkms) should be re-installed after reboot
168-
####################################################################################
141+
# Configure Linux for Neuron repository updates
142+
sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
143+
[neuron]
144+
name=Neuron YUM Repository
145+
baseurl=https://yum.repos.neuron.amazonaws.com
146+
enabled=1
147+
metadata_expire=0
148+
EOF
149+
sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
150+
151+
# Update OS packages
152+
sudo dnf update -y
153+
154+
################################################################################################################
155+
# To install or update to Neuron versions 1.19.1 and newer from previous releases:
156+
# - DO NOT skip 'aws-neuron-dkms' install or upgrade step, you MUST install or upgrade to latest Neuron driver
157+
################################################################################################################
158+
159+
# Install OS headers
160+
sudo dnf install -y "kernel-devel-uname-r = $(uname -r)"
161+
162+
# Install Neuron Driver
163+
sudo dnf install aws-neuron-dkms -y
164+
165+
####################################################################################
166+
# Warning: If Linux kernel is updated as a result of OS package update
167+
# Neuron driver (aws-neuron-dkms) should be re-installed after reboot
168+
####################################################################################
169169
170170
.. dropdown:: Install Docker
171171
:class-title: sphinx-design-class-title-small

dlami/index.rst

Lines changed: 8 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -20,67 +20,9 @@ Neuron Multi Framework DLAMI
2020
Neuron Deep Learning AMI (DLAMI) is a multi-framework DLAMI that supports multiple Neuron framework/libraries. Each DLAMI is pre-installed with Neuron drivers and support all Neuron instance types. Each virtual environment that corresponds to a specific Neuron framework/library
2121
comes pre-installed with all the Neuron libraries including Neuron compiler and Neuron runtime needed for you to easily get started.
2222

23-
24-
.. note::
25-
26-
Tensorflow-neuron 2.10 (inf1) released in SDK v2.20.2 is not compatible with the latest runtime in v2.21 SDK.
27-
Code that compiles will face runtime errors with the latest SDK 2.21.1 version.
28-
29-
Neuron team is aware of this issue and we will ship a single-framework AMI for TF 2.10 inf1 in a future release.
30-
31-
You can use multi-framework DLAMIs from Neuron SDK v2.20.0 for inf1 workloards to avoid this issue. For example:
32-
33-
Deep Learning AMI Neuron (Ubuntu 22.04/AL2023) 20241027
34-
35-
| Ubuntu22: ami-017ff4652165fd617
36-
| AL2023: ami-06fdb253ce8a32239
37-
38-
.. code-block:: shell
39-
40-
aws ec2 run-instances --image-id <ami-id>
41-
42-
43-
Alternatively, you can use the latest Neuron DLAMIs on Ubuntu and run this command as a work-around:
44-
45-
.. code-block:: shell
46-
47-
sudo apt-get remove -y aws-neuronx-dkms aws-neuronx-collectives aws-neuronx-runtime-lib aws-neuronx-tools
48-
sudo apt-get install aws-neuronx-dkms=2.18.* -y
49-
sudo apt-get install aws-neuronx-collectives=2.22.* -y
50-
sudo apt-get install aws-neuronx-runtime-lib=2.22.* -y
51-
sudo apt-get install aws-neuronx-tools=2.19.* -y
52-
53-
https://github.com/aws-neuron/aws-neuron-sdk/issues/1071 for more information on the issue.
54-
55-
5623
.. note::
57-
58-
The AL2023 DLAMI shipped in SDK v2.25 has an issue with the symbolic linking of Python3.10 shared object files which affects PyTorch virtual environments.
59-
This is because AL2023 operating system comes with Python3.9 by default and torch_neuronx requires Python3.10. We have fixed the issue in the upcoming release.
60-
61-
Current workaround:
62-
63-
.. code-block:: shell
64-
65-
sudo ln -sf /usr/lib64/libpython3.9.so.1.0 /usr/lib64/libpython3.10.so.1.0
66-
67-
68-
.. note::
69-
70-
The DLAMIs shipped in SDK v2.25 had an issue with the dependencies within Neuronx Distributed Training.
71-
Specifically, ``setuptools`` had 'xw' extra characters which caused ``setup_nxdt.sh`` script to fail.
72-
This has been removed in the upcoming release.
73-
74-
For a workaround, update the requirements file to remove these extra characters.
75-
76-
.. code-block:: shell
77-
78-
vi /opt/aws_neuronx_venv_pytorch_2_7_nxd_training/bin/requirements.txt
79-
80-
...
81-
setuptools>=70.0xw -> setuptools>=70.0
82-
...
83-
24+
Neuron DLAMIs released with version 2.26.1 no longer support ``Inf1`` instance types due to an incompatibility with the Neuron driver.
25+
If you'd like to run ``Inf1`` workloads, use previous DLAMIs released up to SDK version 2.26.
8426

8527
Multi Framework DLAMIs supported
8628
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -96,11 +38,11 @@ Multi Framework DLAMIs supported
9638
- DLAMI Name
9739

9840
* - Ubuntu 22.04
99-
- Inf1, Inf2, Trn1, Trn1n, Trn2
41+
- Inf2, Trn1, Trn1n, Trn2
10042
- Deep Learning AMI Neuron (Ubuntu 22.04)
10143

10244
* - Amazon Linux 2023
103-
- Inf1, Inf2, Trn1, Trn1n, Trn2
45+
- Inf2, Trn1, Trn1n, Trn2
10446
- Deep Learning AMI Neuron (Amazon Linux 2023)
10547

10648

@@ -134,12 +76,6 @@ Virtual Environments pre-installed
13476
* - Tensorflow 2.10 NeuronX
13577
- /opt/aws_neuronx_venv_tensorflow_2_10
13678

137-
* - Tensorflow 2.10 Neuron (Inf1)
138-
- /opt/aws_neuron_venv_tensorflow_2_10_inf1
139-
140-
* - PyTorch 1.13 Neuron (Inf1)
141-
- /opt/aws_neuron_venv_pytorch_1_13_inf1
142-
14379

14480
Within the PyTorch 2.8 NxD Training virtual environment, we have included a setup script that installs required dependencies for the package. To run this script,
14581
activate the virtual environment and run ``setup_nxdt.sh`` and this will run :ref:`the setup steps here <nxdt_installation_guide>`.
@@ -190,18 +126,7 @@ Single Framework DLAMIs supported
190126
* - Tensorflow 2.10
191127
- Ubuntu 22.04
192128
- Inf2, Trn1, Trn1n, Trn2
193-
- Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 22.04)
194-
195-
* - Tensorflow 2.10 (Inf1)
196-
- Ubuntu 22.04
197-
- Inf1
198-
- Deep Learning AMI Neuron TensorFlow 2.10 Inf1 (Ubuntu 22.04)
199-
200-
* - PyTorch 1.13 (Inf1)
201-
- Ubuntu 22.04
202-
- Inf1
203-
- Deep Learning AMI Neuron PyTorch 1.13 Inf1 (Ubuntu 22.04)
204-
129+
- Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 22.04)
205130

206131
Virtual Environments pre-installed
207132
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -231,19 +156,11 @@ Virtual Environments pre-installed
231156
* - Deep Learning AMI Neuron JAX 0.6 (Ubuntu 22.04, Amazon Linux 2023)
232157
- JAX NeuronX 0.6
233158
- /opt/aws_neuronx_venv_jax_0_6
234-
235-
* - Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 22.04)
236-
- Pytorch Neuron (Inf1)
237-
- /opt/aws_neuron_venv_pytorch_1_13_inf1
238159

239160
* - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 22.04)
240161
- Tensorflow Neuronx
241162
- /opt/aws_neuronx_venv_tensorflow_2_10
242-
243-
* - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 22.04)
244-
- Tensorflow Neuron (Inf1)
245-
- /opt/aws_neuron_venv_tensorflow_2_10_inf1
246-
163+
247164

248165
You can easily get started with the single framework DLAMI through AWS console by following one of the corresponding setup guides . If you are looking to
249166
use the Neuron DLAMI in your cloud automation flows , Neuron also supports :ref:`SSM parameters <ssm-parameter-neuron-dlami>` to easily retrieve the latest DLAMI id.
@@ -267,11 +184,11 @@ Base DLAMIs supported
267184
- DLAMI Name
268185

269186
* - Amazon Linux 2023
270-
- Inf1, Inf2, Trn1n, Trn1, Trn2
187+
- Inf2, Trn1n, Trn1, Trn2
271188
- Deep Learning Base Neuron AMI (Amazon Linux 2023)
272189

273190
* - Ubuntu 22.04
274-
- Inf1, Inf2, Trn1n, Trn1, Trn2
191+
- Inf2, Trn1n, Trn1, Trn2
275192
- Deep Learning Base Neuron AMI (Ubuntu 22.04)
276193

277194

@@ -333,9 +250,6 @@ SSM Parameter Prefix
333250
* - Deep Learning AMI Neuron JAX 0.6 (Amazon Linux 2023)
334251
- /aws/service/neuron/dlami/jax-0.6/amazon-linux-2023
335252

336-
* - Deep Learning AMI Neuron PyTorch 1.13 Inf1 (Ubuntu 22.04)
337-
- /aws/service/neuron/dlami/pytorch-1.13-inf1/ubuntu-22.04
338-
339253
* - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 22.04)
340254
- /aws/service/neuron/dlami/tensorflow-2.10/ubuntu-22.04
341255

0 commit comments

Comments
 (0)