Skip to content

Conversation

@philipithomas
Copy link
Member

Supersedes #5043

@github-actions
Copy link

github-actions bot commented Aug 1, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Member Author

philipithomas commented Aug 1, 2025

Summary: 1 successful workflow, 1 pending workflow

Last updated: 2025-08-01 21:34:08 UTC

@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Aug 1, 2025

Add Morph Embedding Functions (Python & Typescript) with Full Integration

This PR introduces Morph as a first-class embedding function to both Python and TypeScript clients in Chroma. It provides implementations, integration into existing embedding registries, schema validation, comprehensive tests, and user/documentation updates for Morph, an OpenAI-compatible code-focused embedding model. The change includes build and configuration plumbing, package registration, and seamless switching between environment variable or direct API key configuration. Extensive documentation and examples are included for both languages.

Key Changes

• Implements MorphEmbeddingFunction for Python and Typescript, using Morph's OpenAI-compatible embedding API.
• Adds Morph entry to embedding function schemas, registration tables, and package bundles (e.g., @chroma-core/all).
• Python: Embedding function supports config roundtripping, schema validation, and config update policy (prevents model_name changes post-init). Uses OpenAI Python SDK targeting Morph's API.
• Typescript: Full implementation in new-js, with configuration validation, OpenAI SDK usage, and tests for all config/code paths.
• Adds Morph to all relevant docs tables, docs pages, guides, API reference (Markdoc, .txt, README, config).
• Adds integration and E2E tests for Morph in both clients, skipping appropriately if required dependencies are missing.
• Package setup: New Morph package with npm scripts, tsconfig, build config, Jest setup, and lockfile entries.

Affected Areas

• Python: chromadb/utils/embedding_functions/morph_embedding_function.py
• Python: chromadb/utils/embedding_functions/init.py
• Python: test coverage and builtin configs
• Typescript: clients/new-js/packages/ai-embeddings/morph and all/*
• Documentation: Markdoc content, public .txt docs, README
• Schema validation and config utilities in common package
• pnpm lockfile, build configs

This summary was automatically generated by @propel-code-bot

@philipithomas philipithomas changed the title Add Morph embedding functions [ENH] Add Morph embedding functions Aug 1, 2025
Comment on lines +128 to +135
def validate_config_update(
self, old_config: Dict[str, Any], new_config: Dict[str, Any]
) -> None:
if "model_name" in new_config:
raise ValueError(
"The model name cannot be changed after the embedding function has been initialized."
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The current implementation of validate_config_update prevents any update that includes model_name, even if it's the same value. This should be changed to only raise an error if the model_name is different from the existing one.

// use directly
const embeddings = embedder.generate(["function calculate(a, b) { return a + b; }", "class User { constructor(name) { this.name = name; } }"])

// pass documents to query for .add and .query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Rephrase for clarity: change this comment to "// Pass documents to the .add and .query methods".

@philipithomas philipithomas merged commit ebdcc90 into main Aug 1, 2025
59 checks passed
Inventrohyder pushed a commit to Inventrohyder/chroma that referenced this pull request Aug 5, 2025
Supersedes chroma-core#5043

---------

Co-authored-by: bhaktatejas922 <[email protected]>
Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
Co-authored-by: Jeffrey Huber <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants