Update rag-framework.md

ilya-kolchinsky · web-flow · commit e35d2d9c02c8 · 2024-12-11T15:29:03.000+02:00
Added details regarding Haystack licensing and maintenance.

Signed-off-by: Ilya Kolchinsky &lt;58424190+ilya-kolchinsky@users.noreply.github.com&gt;
diff --git a/docs/retrieval-augmented-generation/rag-framework.md b/docs/retrieval-augmented-generation/rag-framework.md
@@ -25,17 +25,19 @@ All of the above offer a variation of a modular pipeline architecture, where use
 
 Out of those, we propose to use Haystack for the following reasons:
 
-1. Focus on RAG. Haystack is a framework specifically targeting RAG use cases and sophisticated RAG indexing and retrieval pipelines. While Langchain and LlamaIndex shine in their own areas, the former is a generalist framework and the latter has a different focus, namely building custom indices over data. Haystack provides functionality that is strongly tailored for RAG and includes a comprehensive library of out-of-the-box solutions for advanced RAG scenarios. As a result, many essential or soon-to-be-essential RAG capabilities can be implemented in a few lines in Haystack but require considerable work to be supported over Langchain or LlamaIndex. Some examples include hybrid retrieval, iterative RAG, HyDE, combining multiple ingestion sources, custom data preprocessing and metadata augmentation. As the decision discussed in this document involves only the RAG component of RHEL AI, we believe that choosing the best RAG framework, as opposed to the best general LLM serving framework, would be more strategically correct.
+1. **Focus on RAG.** Haystack is a framework specifically targeting RAG use cases and sophisticated RAG indexing and retrieval pipelines. While Langchain and LlamaIndex shine in their own areas, the former is a generalist framework and the latter has a different focus, namely building custom indices over data. Haystack provides functionality that is strongly tailored for RAG and includes a comprehensive library of out-of-the-box solutions for advanced RAG scenarios. As a result, many essential or soon-to-be-essential RAG capabilities can be implemented in a few lines in Haystack but require considerable work to be supported over Langchain or LlamaIndex. Some examples include hybrid retrieval, iterative RAG, HyDE, combining multiple ingestion sources, custom data preprocessing and metadata augmentation. As the decision discussed in this document involves only the RAG component of RHEL AI, we believe that choosing the best RAG framework, as opposed to the best general LLM serving framework, would be more strategically correct.
 
-1. Maturity and stability. Haystack is the most mature, established and stable product among the considered alternatives. It has been around for more time overall (since 2017) and accumulated more mileage. Haystack has an active, sizable and steadily growing community.
+2. **Maturity and stability.** Haystack is the most mature, established and stable product among the considered alternatives. It has been around for more time overall (since 2017) and accumulated more mileage. Haystack has an active, sizable and steadily growing community.
 
-2. Extensive vendor support. Haystack natively supports all currently popular vector DBs and provides dedicated backends for incorporating them into its pipelines. Additionally, Haystack supports multiple models and model providers out-of-the-box.
+3. **Extensive vendor support.** Haystack natively supports all currently popular vector DBs and provides dedicated backends for incorporating them into its pipelines. Additionally, Haystack supports multiple models and model providers out-of-the-box.
 
-3. Enterprise-level performance. Haystack is designed for production-grade scalability, supporting distributed systems and high-throughput applications. Moreover, and in contrast to the alternatives (of which only LlamaIndex showcases similar performance and scalability), Haystack is specifically optimized for efficient search and retrieval in the RAG setting.
+4. **Enterprise-level performance.** Haystack is designed for production-grade scalability, supporting distributed systems and high-throughput applications. Moreover, and in contrast to the alternatives (of which only LlamaIndex showcases similar performance and scalability), Haystack is specifically optimized for efficient search and retrieval in the RAG setting.
 
-4. Ease of use and documentation. Being strictly focused on RAG as opposed to taking a generalist approach, the learning curve of Haystack is less steep than that of Langchain. At the same time, Haystack offers extensive documentation and tutorials which are more well-organized and easy to use than those of LlamaIndex.
+5. **Ease of use and documentation.** Being strictly focused on RAG as opposed to taking a generalist approach, the learning curve of Haystack is less steep than that of Langchain. At the same time, Haystack offers extensive documentation and tutorials which are more well-organized and easy to use than those of LlamaIndex.
 
-1. Extending the previous point, Haystack can be seen as a middle ground between Langchain and LlamaIndex, sharing their benefits while only partially inheriting their drawbacks. Like the former, Haystack enables building custom flows and pipelines. Unlike Langchain though, Haystack does not try to be too abstract and general, strictly focusing on RAG and document search instead. As a result, Haystack is more straightforward to use, especially for users looking to implement custom and highly non-standard scenarios. On the other hand, like LlamaIndex, Haystack's performance is optimized towards data retrieval and indexing, but it offers a higher degree of flexibility and better interfaces for custom use cases.
+6. **Architecture.** Extending the previous point, Haystack can be seen as a middle ground between Langchain and LlamaIndex, sharing their benefits while only partially inheriting their drawbacks. Like the former, Haystack enables building custom flows and pipelines. Unlike Langchain though, Haystack does not try to be too abstract and general, strictly focusing on RAG and document search instead. As a result, Haystack is more straightforward to use, especially for users looking to implement custom and highly non-standard scenarios. On the other hand, like LlamaIndex, Haystack's performance is optimized towards data retrieval and indexing, but it offers a higher degree of flexibility and better interfaces for custom use cases.
+   
+7. **Actively maintained open source project under permissive license.** Haystack is very [actively](https://github.com/deepset-ai/haystack/pulse/monthly) [maintained](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue+is%3Aclosed) and [supported](https://github.com/deepset-ai/haystack/discussions). [Tagged versions](https://github.com/deepset-ai/haystack/releases) are released on a regular basis and [trusted publishing automation](https://github.com/deepset-ai/haystack/actions/workflows/pypi_release.yml) is used. Haystack is licensed under Apache 2.0, and all of its dependencies (jinja2, lazy-imports, more-itertools, networkx, numpy, openai, pandas, posthog, python-dateutil, pyyaml, requests, tenacity, tqdm, typing-extensions) are licensed under Apache, MIT, BSD or PSFL.
 
 ## Goals