diff --git a/README.md b/README.md index 412c0ac73..0304d9f56 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,13 @@ Options** ## Architecture -![Sequence diagram of Edge Runtime request flow](assets/edge-runtime-diagram.svg?raw=true) +

+ + + + Sequence diagram of Edge Runtime request flow + +

The edge runtime can be divided into two runtimes with different purposes. @@ -30,7 +36,18 @@ The edge runtime can be divided into two runtimes with different purposes. - User runtime: - An instance for the _user runtime_ is responsible for executing users' code. - Limits are required to be set such as: Memory and Timeouts. - - Has access to environment variables explictly allowed by the main runtime. + - Has access to environment variables explicitly allowed by the main runtime. + +### Edge Runtime in Deep + +#### Conceptual + +- [EdgeRuntime Base](/crates/base/README.md): Overalls about how EdgeRuntime is based on Deno. + +#### Extension Modules + +- [AI](/ext/ai/README.md): Implements AI related features. +- [NodeJs](/ext/node/README.md) & [NodeJs Polyfills](/ext/node/polyfills/README.md): Implements the NodeJs compatibility layer. ## Developers diff --git a/assets/docs/ai/onnx-backend-dark.svg b/assets/docs/ai/onnx-backend-dark.svg new file mode 100644 index 000000000..146c97388 --- /dev/null +++ b/assets/docs/ai/onnx-backend-dark.svg @@ -0,0 +1,4 @@ + + +Onnx Backend(nlp tasks + ort rust)Shipped AssetsModel loadingDownload and cacheorShared instanceWorker 1Warm- Model InferenceInit- Model Lazy LoadingWorker 2Warm- Model InferenceInit- Model Lazy LoadingExecution time:- 1º request: 1s / 5s- 2º request: 400ms Execution time:- 1º request: 400ms- 2º request: 400ms /api/api \ No newline at end of file diff --git a/assets/docs/ai/onnx-backend.svg b/assets/docs/ai/onnx-backend.svg new file mode 100644 index 000000000..9a2397d4d --- /dev/null +++ b/assets/docs/ai/onnx-backend.svg @@ -0,0 +1,4 @@ + + +Onnx Backend(nlp tasks + ort rust)Shipped AssetsModel loadingDownload and cacheorShared instanceWorker 1Warm- Model InferenceInit- Model Lazy LoadingWorker 2Warm- Model InferenceInit- Model Lazy LoadingExecution time:- 1º request: 1s / 5s- 2º request: 400ms Execution time:- 1º request: 400ms- 2º request: 400ms /api/api \ No newline at end of file diff --git a/assets/edge-runtime-diagram-dark.svg b/assets/edge-runtime-diagram-dark.svg new file mode 100644 index 000000000..f19e8b5d2 --- /dev/null +++ b/assets/edge-runtime-diagram-dark.svg @@ -0,0 +1,4 @@ + + +Incomming http reqSb. Worker IsolateMain WorkerSb extensionsUser Worker/endpoint-AUser Worker/endpoint-B Deno CoreSb. InterceptorsHyperOutcome http resEdge Runtimerusttypescript / javascript \ No newline at end of file diff --git a/assets/edge-runtime-diagram.svg b/assets/edge-runtime-diagram.svg index 63cfa5ffe..bc4036c70 100644 --- a/assets/edge-runtime-diagram.svg +++ b/assets/edge-runtime-diagram.svg @@ -1,21 +1,4 @@ - - - - - - - - - - - - - - - - - - - - - + + +Incomming http reqSb. Worker IsolateMain WorkerSb extensionsUser Worker/endpoint-AUser Worker/endpoint-B Deno CoreSb. InterceptorsHyperOutcome http resEdge Runtimerusttypescript / javascript \ No newline at end of file diff --git a/crates/base/README.md b/crates/base/README.md new file mode 100644 index 000000000..a1374b042 --- /dev/null +++ b/crates/base/README.md @@ -0,0 +1,14 @@ +# Supabase EdgeRuntime base + +This crate is part of the Supabase Edge Runtime stack and implements the runtime +core features. + +## Architecture + +

+ + + + Sequence diagram of Edge Runtime request flow + +

diff --git a/ext/ai/README.md b/ext/ai/README.md new file mode 100644 index 000000000..7633f327e --- /dev/null +++ b/ext/ai/README.md @@ -0,0 +1,105 @@ +# Supabase AI module + +This crate is part of the Supabase Edge Runtime stack and implements AI related +features for the `Supabase.ai` namespace. + +## Model Execution Engine + +

+ + + + ONNX Backend illustration + +

+ +`Supabase.ai` uses [onnxruntime](https://onnxruntime.ai/) as internal model +execution engine, backend by [ort pyke](https://ort.pyke.io/) rust bindings. + +The **onnxruntime** API is available from `globalThis` and shares similar specs of [onnxruntime-common](https://github.com/microsoft/onnxruntime/tree/main/js/common). + +The available items are: + +- `Tensor`: Represent a basic tensor with specified dimensions and data type. - "The AI input/output" +- `InferenceSession`: Represent the inner model session. - "The AI model itself" + +
+Usage + +It can be used from the exported `globalThis[Symbol.for("onnxruntime")]` - +but manipulating it directly is not trivial, so in the future you may use the [Inference API #501](https://github.com/supabase/edge-runtime/pull/501) for a more user friendly API. + +```typescript +const { InferenceSession, Tensor } = globalThis[Symbol.for("onnxruntime")]; + +// 'create()' supports an url string buffer or the binary data +const modelUrlBuffer = new TextEncoder().encode("https://huggingface.co/Supabase/gte-small/resolve/main/onnx/model_quantized.onnx"); +const session = await InferenceSession.create(modelUrlBuffer); + +// Example only, in real 'feature-extraction' tensors must be created from the tokenizer step. +const inputs = { + input_ids: new Tensor('float32', [1, 2, 3...], [1, 384]), + attention_mask: new Tensor('float32', [...], [1, 384]), + token_types_ids: new Tensor('float32', [...], [1, 384]) +}; + +const { last_hidden_state } = await session.run(inputs); +console.log(last_hidden_state); +``` + +
+ +### Third party libs + +Originaly this backend was created to implicit integrate with [transformers.js](https://github.com/huggingface/transformers.js/). This way users can still consuming a high-level lib at same time they benefits of all Supabase's Model Execution Engine features, like model optimization and caching. +For further information please check the [PR #436](https://github.com/supabase/edge-runtime/pull/436) as well the [tests folder](/crates/base/test_cases/ai-ort-rust-backend/transformers-js) + +> [!WARNING] +> At this moment users need to explicit target `device: 'auto'` to enable the platform compatibility. + +```typescript +import { env, pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.1'; + +// Broswer cache is now supported for `onnx` models +env.useBrowserCache = true; +env.allowLocalModels = false; + +const pipe = await pipeline('feature-extraction', 'supabase/gte-small', { device: 'auto' }); + +const output = await pipe("This embed will be generated from rust land", { + pooling: 'mean', + normalize: true +}); +``` + +### Self-Hosting + +**Caching filepath**: +The `EXT_AI_CACHE_DIR` environment variable can be use to set a custom cache path + +**Memory clean up**: +For Self-Hosting users an extra method is available for `main/index.ts` scope and should be used to clean up unused sessions, consider adding it into your main entrypoint file: + +```typescript +// cleanup unused sessions every 30s +setInterval(async () => { + try { + const cleanupCount = await EdgeRuntime.ai.tryCleanupUnusedSession(); + if (cleanupCount == 0) { + return; + } + console.log('EdgeRuntime.ai.tryCleanupUnusedSession', cleanupCount); + } catch (e) { + console.error(e.toString()); + } +}, 30 * 1000); +``` + +## The `Session` class + +Prior versions has [introduced](https://supabase.com/blog/ai-inference-now-available-in-supabase-edge-functions) the `Session` class as alternative to `transformers.js` for *gte-small* model and then was used to provide a [LLM interface](https://supabase.com/docs/guides/functions/ai-models?queryGroups=platform&platform=ollama#using-large-language-models-llm) for Ollama and some other providers. + +Since the **Model Execution Engine** was created the `Session` class now can focus on LLM interface while the `Session('gte-small')` is for compatibility purposes only. + +> [!WARNING] +> Docs for Session class will end here - There's a open [PR #539](https://github.com/supabase/edge-runtime/pull/539) that may change a lot of things for it. diff --git a/ext/node/README.md b/ext/node/README.md index d154d8cb6..87d08a664 100644 --- a/ext/node/README.md +++ b/ext/node/README.md @@ -1,3 +1,11 @@ -# deno_node +# Supabase Node module + +This crate is part of the Supabase Edge Runtime stack and implements NodeJs +related features. + +To see all compatible features, please check the +[NodeJs Polyfills](/ext/node/polyfills/README.md) section. + +## deno_node `require` and other node related functionality for Deno. diff --git a/ext/node/polyfills/README.md b/ext/node/polyfills/README.md index 26527278e..1e6e82bdc 100644 --- a/ext/node/polyfills/README.md +++ b/ext/node/polyfills/README.md @@ -1,6 +1,7 @@ -# Deno Node.js compatibility +# Supabase Node.js compatibility module -This module is meant to have a compatibility layer for the +This crate is part of the Supabase Edge Runtime stack and implements a +compatibility layer for the [Node.js standard library](https://nodejs.org/docs/latest/api/). **Warning**: Any function of this module should not be referred anywhere in the @@ -59,7 +60,7 @@ Deno standard library as it's a compatibility module. - [x] worker_threads - [ ] zlib -* [x] node globals _partly_ +- [x] node globals _partly_ ### Deprecated