You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pull request primarily focuses on updating documentation and
correcting file paths to improve clarity and maintain consistency within
the project's `README.md` and `libs/README.md` files. The changes ensure
that the documentation reflects the correct structure and paths for
various components and services.
### Documentation Updates:
* **Corrected URL in `README.md`:** Updated the Tilt guide URL to
include the `.html` file extension for proper navigation. (`README.md`,
[README.mdL41-R41](diffhunk://#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L41-R41))
* **Updated paths in `libs/README.md`:**
- Fixed file paths for `chunker` and `document_extractor` to point to
the correct modules. (`libs/README.md`,
[libs/README.mdL154-R158](diffhunk://#diff-34194a117b05d75d22ca968cdb7d540839dc7a0eb33960fbca668b5a6ade87cbL154-R158))
- Adjusted paths for extractors, converters, and related components
under the "Replaceable parts" section to reflect the updated directory
structure. (`libs/README.md`,
[libs/README.mdR226-R238](diffhunk://#diff-34194a117b05d75d22ca968cdb7d540839dc7a0eb33960fbca668b5a6ade87cbR226-R238))
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ The template supports multiple LLM (Large Language Model) providers, such as STA
38
38
39
39
40
40
## 1. Getting Started
41
-
A [`Tiltfile`](./Tiltfile) is provided to get you started :rocket:. If Tilt is new for you, and you want to learn more about it, please take a look at the [Tilt guides](https://docs.tilt.dev/tiltfile_authoring).
41
+
A [`Tiltfile`](./Tiltfile) is provided to get you started :rocket:. If Tilt is new for you, and you want to learn more about it, please take a look at the [Tilt guides](https://docs.tilt.dev/tiltfile_authoring.html).
| file_service |[`admin_api_lib.file_services.file_service.FileService`](./admin-api-lib/src/admin_api_lib/file_services/file_service.py)|[`admin_api_lib.impl.file_services.s3_service.S3Service`](./admin_api_lib/src/admin_api_lib/impl/file_services/s3_service.py)| Handles operations on the connected storage. |
154
+
| file_service |[`admin_api_lib.file_services.file_service.FileService`](./admin-api-lib/src/admin_api_lib/file_services/file_service.py)|[`admin_api_lib.impl.file_services.s3_service.S3Service`](./admin-api-lib/src/admin_api_lib/impl/file_services/s3_service.py)| Handles operations on the connected storage. |
155
155
| large_language_model |`langchain_core.language_models.llms.BaseLLM`|`langchain_community.llms.vllm.VLLMOpenAI` or `langchain_community.llms.Ollama`| The LLm that is used for all LLM tasks. The default depends on the value of `rag_core_lib.impl.settings.rag_class_types_settings.RAGClassTypeSettings.llm_type`|
156
156
| key_value_store |[`admin_api_lib.impl.key_db.file_status_key_value_store.FileStatusKeyValueStore`](./admin-api-lib/src/admin_api_lib/impl/key_db/file_status_key_value_store.py)|[`admin_api_lib.impl.key_db.file_status_key_value_store.FileStatusKeyValueStore`](./admin-api-lib/src/admin_api_lib/impl/key_db/file_status_key_value_store.py)| Is used for storing the available sources and their current state. |
157
-
| chunker |[`admin_api_lib.impl.chunker.chunker.Chunker`](./admin-api-lib/src/admin_api_lib/impl/chunker/chunker.py)|[`admin_api_lib.impl.chunker.text_chunker.TextChunker`](./admin-api-lib/src/admin_api_lib/impl/chunker/text_chunker.py)| Used for splitting the documents in managable chunks. |
158
-
| document_extractor |[`admin_api_lib.extractor_api_client.openapi_client.api.extractor_api.ExtractorApi`](./admin-api-lib/src/admin_api_lib/extractor_api_client/openapi_client/api/extractor_api.py)|[`admin_api_lib.extractor_api_client.openapi_client.api.extractor_api.ExtractorApi`](./admin-api-lib/src/admin_api_lib.extractor_api_client/openapi_client/api/extractor_api.py)| Needs to be replaced if adjustments to the `extractor-api` is made. |
157
+
| chunker |[`admin_api_lib.chunker.chunker.Chunker`](./admin-api-lib/src/admin_api_lib/chunker/chunker.py)|[`admin_api_lib.impl.chunker.text_chunker.TextChunker`](./admin-api-lib/src/admin_api_lib/impl/chunker/text_chunker.py)| Used for splitting the documents in managable chunks. |
158
+
| document_extractor |[`admin_api_lib.extractor_api_client.openapi_client.api.extractor_api.ExtractorApi`](./admin-api-lib/src/admin_api_lib/extractor_api_client/openapi_client/api/extractor_api.py)|[`admin_api_lib.extractor_api_client.openapi_client.api.extractor_api.ExtractorApi`](./admin-api-lib/src/admin_api_lib/extractor_api_client/openapi_client/api/extractor_api.py)| Needs to be replaced if adjustments to the `extractor-api` is made. |
159
159
| rag_api |[`admin_api_lib.rag_backend_client.openapi_client.api.rag_api.RagApi`](./admin-api-lib/src/admin_api_lib/rag_backend_client/openapi_client/api/rag_api.py)|[`admin_api_lib.rag_backend_client.openapi_client.api.rag_api.RagApi`](./admin-api-lib/src/admin_api_lib/rag_backend_client/openapi_client/api/rag_api.py)| Needs to be replaced if changes to the `/information_pieces/remove` or `/information_pieces/upload` of the [`rag-core-api`](#rag-core-api) are made. |
160
160
| summarizer_prompt |`str`|[`admin_api_lib.prompt_templates.summarize_prompt.SUMMARIZE_PROMPT`](./admin-api-lib/src/admin_api_lib/prompt_templates/summarize_prompt.py)| The prompt used of the summarization. |
161
161
| langfuse_manager |[`rag_core_lib.impl.langfuse_manager.langfuse_manager.LangfuseManager`](./rag-core-lib/src/rag_core_lib/impl/langfuse_manager/langfuse_manager.py)|[`rag_core_lib.impl.langfuse_manager.langfuse_manager.LangfuseManager`](./rag-core-lib/src/rag_core_lib/impl/langfuse_manager/langfuse_manager.py)| Retrieves additional settings, as well as the prompt from langfuse if available. |
@@ -225,16 +225,15 @@ Technically, all parameters of the `SitemapLoader` from LangChain can be provide
| file_service |[`extractor_api_lib.file_services.file_service.FileService`](./extractor-api-lib/src/extractor_api_lib/file_services/file_service.py)|[`extractor_api_lib.file_services.s3_service.S3Service`](./extractor-api-lib/src/extractor_api_lib/file_services/s3_service.py)| Handles operations on the connected storage. |
229
-
| database_converter |[`extractor_api_lib.document_parser.table_converters.dataframe_converter.DataframeConverter`](./extractor-api-lib/src/extractor_api_lib/document_parser/table_converters/dataframe_converter.py)|[`extractor_api_lib.document_parser.table_converters.dataframe2markdown.DataFrame2Markdown`](./extractor-api-lib/src/extractor_api_lib/document_parser/table_converters/dataframe2markdown.py)| Converts the extracted table from *pandas.DataFrame* to markdown. If you want the table to have another format, this would need to be adjusted. |
230
-
| pdf_extractor |[`extractor_api_lib.document_parser.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/information_extractor.py)|[`extractor_api_lib.document_parser.pdf_extractor.PDFExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/pdf_extractor.py)| Extractor used for extracting information from PDF documents. |
231
-
| ms_docs_extractor |[`extractor_api_lib.document_parser.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/information_extractor.py)|[`extractor_api_lib.document_parser.ms_docs_extractor.MSDocsExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/ms_docs_extractor.py)| Extractor used for extracting information from Microsoft Documents like *.docx, etc. |
232
-
| xml_extractor |[`extractor_api_lib.document_parser.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/information_extractor.py)|[`extractor_api_lib.document_parser.xml_extractor.XMLExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/xml_extractor.py)| Extractor used for extracting content from XML documents. |
233
-
| all_extractors |`dependency_injector.providers.List[extractor_api_lib.document_parser.information_extractor.InformationExtractor]`|`dependency_injector.providers.List(pdf_extractor, ms_docs_extractor, xml_extractor)`| List of all available extractors. If you add a new type of extractor you would have to add it to this list. |
234
-
| general_extractor |[`extractor_api_lib.document_parser.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/information_extractor.py)|[`extractor_api_lib.document_parser.general_extractor.GeneralExtractor`](./extractor-api-lib/src/extractor_api_lib/document_parser/general_extractor.py)| Combines multiple extractors and decides which one to use for the given file format. |
235
-
| file_extractor |[`extractor_api_lib.api_endpoints.file_extractor.FileExtractor`](./extractor-api-lib/src/extractor_api_lib/api_endpoints/file_extractor.py)|[`extractor_api_lib.impl.api_endpoints.default_file_extractor.DefaultFileExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/api_endpoints/default_file_extractor.py)| Implementation of the `/extract_from_file` endpoint. Uses *general_extractor*. |
228
+
| file_service |[`extractor_api_lib.file_services.file_service.FileService`](./extractor-api-lib/src/extractor_api_lib/file_services/file_service.py)|[`extractor_api_lib.impl.file_services.s3_service.S3Service`](./extractor-api-lib/src/extractor_api_lib/impl/file_services/s3_service.py)| Handles operations on the connected storage. |
229
+
| database_converter |[`extractor_api_lib.table_converter.dataframe_converter.DataframeConverter`](./extractor-api-lib/src/extractor_api_lib/table_converter/dataframe_converter.py)|[`extractor_api_lib.impl.table_converter.dataframe2markdown.DataFrame2Markdown`](./extractor-api-lib/src/extractor_api_lib/impl/table_converter/dataframe2markdown.py)| Converts the extracted table from *pandas.DataFrame* to markdown. If you want the table to have another format, this would need to be adjusted. |
230
+
| pdf_extractor |[`extractor_api_lib.extractors.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/information_extractor.py)|[`extractor_api_lib.impl.extractors.file_extractors.pdf_extractor.PDFExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/extractors/file_extractors/pdf_extractor.py)| Extractor used for extracting information from PDF documents. |
231
+
| ms_docs_extractor |[`extractor_api_lib.extractors.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/information_extractor.py)|[`extractor_api_lib.extractors.file_extractors.ms_docs_extractor.MSDocsExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/extractors/file_extractors/ms_docs_extractor.py)| Extractor used for extracting information from Microsoft Documents like *.docx, etc. |
232
+
| xml_extractor |[`extractor_api_lib.extractors.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/information_extractor.py)|[`extractor_api_lib.extractors.file_extractors.xml_extractor.XMLExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/extractors/file_extractors/xml_extractor.py)| Extractor used for extracting content from XML documents. |
233
+
| all_extractors |`dependency_injector.providers.List[extractor_api_lib.extractors.information_extractor.InformationExtractor]`|`dependency_injector.providers.List(pdf_extractor, ms_docs_extractor, xml_extractor)`| List of all available extractors. If you add a new type of extractor you would have to add it to this list. |
234
+
| general_file_extractor |[`extractor_api_lib.api_endpoints.file_extractor.FileExtractor`](./extractor-api-lib/src/extractor_api_lib/api_endpoints/file_extractor.py)|[`extractor_api_lib.impl.api_endpoints.general_file_extractor.GeneralFileExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/api_endpoints/general_file_extractor.py)| Combines multiple file extractors and decides which one to use for the given file format. |
236
235
| general_source_extractor |[`extractor_api_lib.api_endpoints.source_extractor.SourceExtractor`](./extractor-api-lib/src/extractor_api_lib/api_endpoints/source_extractor.py)|[`extractor_api_lib.impl.api_endpoints.general_source_extractor.GeneralSourceExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/api_endpoints/general_source_extractor.py)| Implementation of the `/extract_from_source` endpoint. Will decide the correct extractor for the source. |
237
-
| confluence_extractor |[`extractor_api_lib.extractors.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/information_extractor.py)|[`extractor_api_lib.impl.extractors.confluence_extractor.ConfluenceExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/confluence_extractor.py)| Implementation of an esxtractor for the source `confluence`. |
236
+
| confluence_extractor |[`extractor_api_lib.extractors.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/information_extractor.py)|[`extractor_api_lib.impl.extractors.confluence_extractor.ConfluenceExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/extractors/confluence_extractor.py)| Implementation of an extractor for the source `confluence`. |
238
237
| sitemap_extractor |[`extractor_api_lib.extractors.information_extractor.InformationExtractor`](./extractor-api-lib/src/extractor_api_lib/extractors/information_extractor.py)|[`extractor_api_lib.impl.extractors.sitemap_extractor.SitemapExtractor`](./extractor-api-lib/src/extractor_api_lib/impl/extractors/sitemap_extractor.py)| Implementation of an extractor for the source `sitemap`. Supports XML sitemap crawling with configurable parameters including URL filtering, custom headers, and crawling depth. Uses LangChain's SitemapLoader with support for custom parsing and meta functions via dependency injection. |
239
238
| sitemap_parsing_function |`dependency_injector.providers.Factory[Callable]`|[`extractor_api_lib.impl.utils.sitemap_extractor_utils.custom_sitemap_parser_function`](./extractor-api-lib/src/extractor_api_lib/impl/utils/sitemap_extractor_utils.py)| Custom parsing function for sitemap content extraction. Used by the sitemap extractor to parse HTML content from web pages. Can be replaced to customize how web page content is processed and extracted. |
240
239
| sitemap_meta_function |`dependency_injector.providers.Factory[Callable]`|[`extractor_api_lib.impl.utils.sitemap_extractor_utils.custom_sitemap_meta_function`](./extractor-api-lib/src/extractor_api_lib/impl/utils/sitemap_extractor_utils.py)| Custom meta function for sitemap content processing. Used by the sitemap extractor to extract metadata from web pages. Can be replaced to customize how metadata is extracted and structured from web content. |
0 commit comments