Update doc: change LLaMA as Prototype

aws-bowencc · hannanjgaws · commit ea9dd5f454a4 · 2023-07-21T22:59:47.000Z
SIM: https://i.amazon.com/kaena-18274 cr: https://code.amazon.com/reviews/CR-95898331
diff --git a/README.md b/README.md
@@ -419,7 +419,7 @@ for running HuggingFace `facebook/opt-13b` autoregressive sampling on a trn1.2xl
 -  [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
 -  [GPT-Neox [Experimental]](https://huggingface.co/docs/transformers/model_doc/gpt_neox)
 -  [Bloom [Experimental]](https://huggingface.co/docs/transformers/model_doc/bloom)
--  [LLaMA [Experimental]](https://huggingface.co/docs/transformers/main/model_doc/llama)
+-  [LLaMA [Prototype]](https://huggingface.co/docs/transformers/main/model_doc/llama)
 
 # Upcoming features
 
diff --git a/releasenotes.md b/releasenotes.md
@@ -6,7 +6,7 @@ Date: 2023-07-03
 
 - [Experimental] Added support for GPT-NeoX models.
 - [Experimental] Added support for BLOOM models.
-- [Experimental] Added support for LLaMA models.
+- [Prototype] Added support for LLaMA models.
 - Added support for more flexible tensor-parallel configurations to GPT2, OPT, and BLOOM. Previously, we had two constraints on `tp_degree`: 1) The attention heads needs to be evenly divisible by `tp_degree` 2) The `tp_degree` needs to satisfy the runtime topologies constraint for collective communication (i.e Allreduce). For more details on supported topologies, see: [Tensor-parallelism support](README.md#tensor-parallelism-support) and https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-features/collective-communication.html. We now get rid of 1) by using 1-axis padding.
 - Added multi-query / multi-group attention support for GPT2.