You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* WIP Add ChromaDB support and update dependencies
* WIP Add ChromaDB configuration support and update UI components
* WIP Enhance ChromaDB integration by adding SSL, token, tenant, and database configurations
* fix: mypy errors
* feat: Enhance ChromaDB support by adding database configuration and updating local development script
* feat: Disable input fields and switch for ChromaDB in case of enableModification as False
* Use ChromaDB constants for default tenant and database
* fix: Removed SSL configuration and infer SSL from host URL, add chromadb to localhost script
* fix: mypy errors
* feat: Add SSL configuration for ChromaDB client with conditional settings based on host URL
* refactor: Comment out SSL certificate configuration for ChromaDB client, update instructions for future implementation
* feat: Enhance ChromaDB configuration in .env and README, add SSL cert path support in settings
* Remove https check for setting port and ssl_verify
* Log port parsing errors
* More cleanups
* Fix my check oops
* Update publish_release.yml workflow to trigger on bs/chromadb branch, refine startup_app.sh script comments for clarity
* Update release version to dev-chromadb
* Implement support for ChromaDB as an alternative local vector DB provider
* Update release version to dev-chromadb
* Add .cursor to gitignore, update startup_app.sh to use uvx to start chroma
* Update release version to dev-chromadb
* Add support for controlling anonymized telemetry in ChromaDB client
* Update release version to dev-chromadb
* Remove ChromaDB anonymized telemetry configuration option from UI
* Update release version to dev-chromadb
* bug fix remove undefined arg from chromadb_config
* Fix: Refactor ChromaVectorStore visualize
* fix: ruff and mypy errors
* Flatten metadata in EmbeddingIndexer when vector store has flat_metadata enabled
* Move flat_metadata to VectorStore and flatten metadata for summary indexer
---------
Co-authored-by: Michael Liu <[email protected]>
Co-authored-by: actions-user <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+47-9Lines changed: 47 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,6 +52,32 @@ RAG Studio can utilize the local file system or an S3 bucket for storing documen
52
52
53
53
S3 will also require providing the AWS credentials for the bucket.
54
54
55
+
### Vector Database Options
56
+
57
+
RAG Studio supports Qdrant (default), OpenSearch (Cloudera Semantic Search), and ChromaDB.
58
+
59
+
- To choose the vector DB, set `VECTOR_DB_PROVIDER` to one of `QDRANT`, `OPENSEARCH`, or `CHROMADB` in your `.env`.
60
+
61
+
#### ChromaDB Setup
62
+
63
+
If you select ChromaDB, configure the following environment variables in `.env`:
64
+
65
+
-`CHROMADB_HOST` - Hostname or URL for ChromaDB. Use `localhost` for local Docker.
66
+
-`CHROMADB_PORT` - Port for ChromaDB (default `8000`). Not required if `CHROMADB_HOST` starts with `https://` and the server infers the port.
67
+
-`CHROMADB_TENANT` - Optional. Defaults to the Chroma default tenant.
68
+
-`CHROMADB_DATABASE` - Optional. Defaults to the Chroma default database.
69
+
-`CHROMADB_TOKEN` - Optional. Include if your Chroma server requires an auth token.
70
+
-`CHROMADB_SERVER_SSL_CERT_PATH` - Optional. Path to PEM bundle for TLS verification when using HTTPS with a private CA.
71
+
-`CHROMADB_ENABLE_ANONYMIZED_TELEMETRY` - Optional. Enables anonymized telemetry in the ChromaDB client; defaults to `false`.
72
+
73
+
Notes:
74
+
75
+
- The local-dev script will automatically start a ChromaDB Docker container when `VECTOR_DB_PROVIDER=CHROMADB`, `CHROMADB_HOST=localhost` on `CHROMADB_PORT=8000`.
76
+
- ChromaDB collections are automatically namespaced using the tenant and database values to avoid conflicts between different RAG Studio instances.
77
+
- For production deployments, consider using a dedicated ChromaDB server with authentication enabled via `CHROMADB_TOKEN`.
78
+
- When using HTTPS endpoints, ensure your certificate chain is properly configured or provide the CA bundle path via `CHROMADB_SERVER_SSL_CERT_PATH`.
79
+
- Anonymized telemetry is disabled by default. You can enable it either by setting `CHROMADB_ENABLE_ANONYMIZED_TELEMETRY=true`.
80
+
55
81
### Enhanced Parsing Options:
56
82
57
83
RAG Studio can optionally enable enhanced parsing by providing the `USE_ENHANCED_PDF_PROCESSING` environment variable. Enabling this will allow RAG Studio to parse images and tables from PDFs. When enabling this feature, we strongly recommend using this with a GPU and at least 16GB of memory.
@@ -82,7 +108,7 @@ This variable can be set from the project settings for the AMP in CML.
82
108
## Air-gapped Environments
83
109
84
110
If you are using an air-gapped environment, you will need to whitelist at the minimum the following domains in order to use the AMP.
85
-
There may be other domains that need to be whitelisted depending on your environment and the model service provider you select.
111
+
There may be other domains that need to be whitelisted depending on your environment and the model service provider you select.
86
112
87
113
-`https://github.com`
88
114
-`https://raw.githubusercontent.com`
@@ -150,17 +176,29 @@ the Node service locally, you can do so by following these steps:
150
176
docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/databases/qdrant_storage:/qdrant/storage:z qdrant/qdrant
0 commit comments