aws-samples
diff --git a/‎content/dynamodb-opensearch-zetl/integrations/index.en.md‎
Lines changed: 3 additions & 1 deletion b/‎content/dynamodb-opensearch-zetl/integrations/index.en.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎content/dynamodb-opensearch-zetl/integrations/os-connectors.en.md‎
Lines changed: 33 additions & 24 deletions b/‎content/dynamodb-opensearch-zetl/integrations/os-connectors.en.md‎
Lines changed: 33 additions & 24 deletions
diff --git a/‎content/dynamodb-opensearch-zetl/integrations/zetl.en.md‎
Lines changed: 80 additions & 96 deletions b/‎content/dynamodb-opensearch-zetl/integrations/zetl.en.md‎
Lines changed: 80 additions & 96 deletions
@@ -4,4 +4,6 @@ menuTitle: "Integrations"
 date: 2024-02-23T00:00:00-00:00
 weight: 30
 ---
-In this section, you will configure integrations between services. You'll first set up ML and Pipeline connectors in OpenSearch Service followed by a zero ETL connector to move data written to DynamoDB to OpenSearch. Once these integrations are set up, you'll be able to write records to DynamoDB as your source of truth and then automatically have that data available to query in other services.
+In this section, you will configure integrations between services. First you will set up machine learning (ML) and Pipeline connectors in OpenSearch Service. Then you will setup a zero-ETL connector to move data stored in DynamoDB into OpenSearch for indexing. Once both these integrations are set up, you'll be able to write records to DynamoDB as your source of truth and then automatically have that data available to query in the other services.
+
+![Integrations](/static/images/connectionsandpipelines.png)
@@ -4,19 +4,21 @@ menuTitle: "Load DynamoDB Data"
 date: 2024-02-23T00:00:00-00:00
 weight: 20
 ---
-In this section you'll configure ML and Pipeline connectors in OpenSearch Service. These configurations are set up by a series of POST and PUT requests that are authenticated with AWS Signature Version 4 (sig-v4). Sigv4 is the standard authentication mechanism used by AWS services. While in most cases an SDK abstracts away sig-v4 but in this case we will be building the requests ourselves with curl.
+In this section you'll configure OpenSearch so it will preprocess and enrich data as it is written to its indexes, by connecting to an externally hosted machine learning embeddings model. This is a simpler application design than having your application write the embeddings as an attribute in the Item within DynamoDB. Instead, the data is kept as text in DynamoDB and when it arrives in OpenSearch, OpenSearch will connect out using Bedrock to generate and store the embeddings.
 
-Building a sig-v4 signed request requires a session token, access key, and secret access key. You'll first retrieve these from your Cloud9 Instance metadata with the provided "credentials.sh" script which exports required values to environmental variables. In the following steps, you'll also export other values to environmental variables to allow for easy substitution into listed commands.
+More information on this design can be around at [ML and Pipeline connectors in OpenSearch Service](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/index/). 
 
- 1. Run the credentials.sh script to retrieve and export credentials. These credentials will be used to sign API requests to the OpenSearch cluster. Note the leading "." before "./credentials.sh", this must be included to ensure that the exported credentials are available in the currently running shell.
-    ```bash
-    . ./credentials.sh 
-    ```
- 1. Next, export an environmental variable with the OpenSearch endpoint URL. This URL is listed in the CloudFormation Stack Outputs tab as "OSDomainEndpoint". This variable will be used in subsequent commands.
-    ```bash
-    export OPENSEARCH_ENDPOINT="https://search-ddb-os-xxxx-xxxxxxxxxxxxx.us-west-2.es.amazonaws.com"
-    ```
- 1. Execute the following curl command to create the OpenSearch ML model connector.
+We will perform these configurations using a series of POST and PUT requests made to OpenSearch endpoints. The calls will be made using the IAM role that was previously mapped to the OpenSearch "all_access" role.
+
+The calls are authenticated with AWS Signature Version 4 (sig-v4). Sigv4 is the standard authentication mechanism used by AWS services. In most cases an SDK abstracts away the sig-v4 details, but in this case we will be building the requests ourselves with curl.
+
+Building a sig-v4 signed request requires a session token, access key, and secret access key. These are available to your VS Code Instance as metadata. These values were retrieved by the "credentials.sh" script you ran during setup. It pulled the required values and then exported them as environmental variables for your use. In the following steps, you'll also export other values to environmental variables to allow for easy substitution into the various commands.
+
+If any of the following commands fail, try re-running the credentials.sh script in the :link[Environment Setup]{href="/setup/step1"} step.
+
+As you run these steps, be very careful about typos. Also remember the Copy icon in the corner.
+
+ 1. Execute the following curl command to **create the OpenSearch ML model connector**. You can use ML connectors to connect OpenSearch Service to a model hosted on bedrock or a model hosted on a third party platform. Here we are connecting to the Titan embedding model hosted on bedrock.
     ```bash
     curl --request POST \
       ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/connectors/_create' \
@@ -53,11 +55,11 @@ Building a sig-v4 signed request requires a session token, access key, and secre
       ]
     }'
     ```
- 1. Note the "connector_id" returned in the previous command. Export it to an environmental variable for convenient substitution in future commands.
+ 1. Note the **"connector_id"** returned in the previous command. **Export it to an environmental variable** for convenient substitution in future commands.
     ```bash
     export CONNECTOR_ID='xxxxxxxxxxxxxx'
     ```
- 1. Run the next curl command to create the model group.
+ 1. Run the next curl command to **create the model group**.
     ```bash
     curl --request POST \
       ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/model_groups/_register' \
@@ -71,7 +73,7 @@ Building a sig-v4 signed request requires a session token, access key, and secre
         "description": "This is an example description"
     }'
     ```
- 1. Note the "model_group_id" returned in the previous command. Export it to an environmental variable for later substitution.
+ 1. Note the **"model_group_id"** returned in the previous command. **Export it to an environmental variable** for later substitution.
     ```bash
     export MODEL_GROUP_ID='xxxxxxxxxxxxx'
     ```
@@ -92,15 +94,17 @@ Building a sig-v4 signed request requires a session token, access key, and secre
       "connector_id": "'${CONNECTOR_ID}'"
     }'
     ```
- 1. Note the "model_id" and export it.
+ 1. Note the **"model_id"** (NOT the task_id) and export it.
     ```bash
     export MODEL_ID='xxxxxxxxxxxxx'
     ```
- 1. Run the following command to verify that you have successfully exported the connector, model group, and model id.
+ 1. Run the following command to **verify that you have successfully exported the connector, model group, and model id**.
     ```bash
     echo -e "CONNECTOR_ID=${CONNECTOR_ID}\nMODEL_GROUP_ID=${MODEL_GROUP_ID}\nMODEL_ID=${MODEL_ID}"
     ```
- 1. Next, we'll deploy the model with the following curl.
+
+	::alert[_Make sure the environment variables are exported well. Otherwise, it will cause errors in the next commands_]
+ 1. Next, we'll **deploy the model** with the following curl.
     ```bash
     curl --request POST \
       ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/'${MODEL_ID}'/_deploy' \
@@ -111,11 +115,13 @@ Building a sig-v4 signed request requires a session token, access key, and secre
       --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}"
     ```
 
-    With the model created, OpenSearch can now use Bedrock's Titan embedding model for processing text. An embeddings model is a type of machine learning model that transforms high-dimensional data (like text or images) into lower-dimensional vectors, known as embeddings. These vectors capture the semantic or contextual relationships between the data points in a more compact, dense representation.
+	With the model created, **OpenSearch can now use Bedrock's Titan embedding model** for processing text. 
 
-    The embeddings represent the semantic meaning of the input data, in this case product descriptions. Words with similar meanings are represented by vectors that are close to each other in the vector space. For example, the vectors for "sturdy" and "strong" would be closer to each other than to "warm".
+	**An embeddings model** is a type of machine learning model that transforms high-dimensional data (like text or images) into lower-dimensional vectors, known as embeddings. These vectors capture the semantic or contextual relationships between the data points in a more compact, dense representation.
 
- 1. Now we can test the model. If you recieve results back with a "200" status code, everything is working properly.
+	The embeddings represent the semantic meaning of the input data, in this case product descriptions. Words with similar meanings are represented by vectors that are close to each other in the vector space. For example, the vectors for "sturdy" and "strong" would be closer to each other than to "stringy".
+
+ 1. Now we can *test the model*. With the below command, we are sending some text to OpenSearch and asking it to return the Vector embeddings using the configured "MODEL_ID". If you receive results back with a "200" status code, everything is working properly.
     ```bash
     curl --request POST \
       ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/'${MODEL_ID}'/_predict' \
@@ -130,7 +136,9 @@ Building a sig-v4 signed request requires a session token, access key, and secre
       }
     }'
     ```
- 1. Next, we'll create the Details table mapping pipeline.
+	::alert[_Output will have vector embeddings as well. So, try to find the statuscode variable to check the status._]
+
+ 1. Next, we'll create the **ProductDetails table mapping ingest pipeline**. An **ingest pipeline** is a sequence of processors that are applied to documents as they are ingested into an index. This uses the configured model to generate the embeddings. Once this is created, as new data arrives into OpenSearch from the DynamoDB "ProductDetails" table the embeddings will be created and indexed.
     ```bash
     curl --request PUT \
       ${OPENSEARCH_ENDPOINT}'/_ingest/pipeline/product-en-nlp-ingest-pipeline' \
@@ -158,7 +166,8 @@ Building a sig-v4 signed request requires a session token, access key, and secre
       ]
     }'
     ```
- 1. Followed by the Reviews table mapping pipeline. We won't use this in this version of the lab, but in a real system you will want to keep your embeddings indexes separate for different queries.
+	::alert[_Here, we have created the processor which is going to take the source and create embedding which will be under 'product_embedding'_]
+ 1. Followed by the **Reviews table mapping pipeline**. We won't use this in this version of the lab, but in a real system you will want to keep your embeddings indexes separate for different queries. Note the different endpoint pipeline path.
     ```bash
     curl --request PUT \
       ${OPENSEARCH_ENDPOINT}'/_ingest/pipeline/product-reviews-nlp-ingest-pipeline' \
@@ -177,7 +186,7 @@ Building a sig-v4 signed request requires a session token, access key, and secre
         },
         {
           "text_embedding": {
-            "model_id": "m6jIgowBXLzE-9O0CcNs",
+            "model_id": "'${MODEL_ID}'",
             "field_map": {
               "combined_field": "product_reviews_embedding"
             }
@@ -187,4 +196,4 @@ Building a sig-v4 signed request requires a session token, access key, and secre
     }'
     ```
 
-    These pipelines allow OpenSearch to preprocess and enrich data as it is written to the index by adding embeddings through the Bedrock connector.
+**These pipelines allow OpenSearch to preprocess and enrich data as it is written to the index by adding embeddings through the Bedrock connector**.
@@ -6,112 +6,96 @@ weight: 30
 ---
 Amazon DynamoDB offers a zero-ETL integration with Amazon OpenSearch Service through the DynamoDB plugin for OpenSearch Ingestion. Amazon OpenSearch Ingestion offers a fully managed, no-code experience for ingesting data into Amazon OpenSearch Service. 
 
- 1. Open [OpenSearch Service Ingestion Pipelines](https://us-west-2.console.aws.amazon.com/aos/home?region=us-west-2#opensearch/ingestion-pipelines)
- 1. Click "Create pipeline"
-
-    ![Create pipeline](/static/images/ddb-os-zetl13.jpg)
-
- 1. Name your pipeline, and include the following for your pipeline configuration. The configuration contains multiple values that need to be updated. The needed values are provided in the CloudFormation Stack Outputs as "Region", "Role", "S3Bucket", "DdbTableArn", and "OSDomainEndpoint".
-    ```yaml
-      version: "2"
-      dynamodb-pipeline:
-        source:
-          dynamodb:
-            acknowledgments: true
-            tables:
-              # REQUIRED: Supply the DynamoDB table ARN
-              - table_arn: "{DDB_TABLE_ARN}"
-                stream:
-                  start_position: "LATEST"
-                export:
-                  # REQUIRED: Specify the name of an existing S3 bucket for DynamoDB to write export data files to
-                  s3_bucket: "{S3BUCKET}"
-                  # REQUIRED: Specify the region of the S3 bucket
-                  s3_region: "{REGION}"
-                  # Optionally set the name of a prefix that DynamoDB export data files are written to in the bucket.
-                  s3_prefix: "pipeline"
-            aws:
-              # REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
-              sts_role_arn: "{ROLE}"
-              # REQUIRED: Provide the region
-              region: "{REGION}"
-        sink:
-          - opensearch:
-              hosts:
-                  # REQUIRED: Provide an AWS OpenSearch endpoint, including https://
-                [
-                  "{OS_DOMAIN_ENDPOINT}"
-                ]
-              index: "product-details-index-en"
-              index_type: custom
-              template_type: "index-template"
-              template_content: |
-                {
-                  "template": {
-                    "settings": {
-                      "index.knn": true,
-                      "default_pipeline": "product-en-nlp-ingest-pipeline"
-                    },
-                    "mappings": {
-                      "properties": {
-                        "ProductID": {
-                          "type": "keyword"
-                        },
-                        "ProductName": {
-                          "type": "text"
-                        },
-                        "Category": {
-                          "type": "text"
-                        },
-                        "Description": {
-                          "type": "text"
-                        },
-                        "Image": {
-                           "type": "text"
-                        },
-                        "combined_field": {
-                          "type": "text"
-                        },
-                        "product_embedding": {
-                          "type": "knn_vector",
-                          "dimension": 1536,
-                          "method": {
-                            "engine": "nmslib",
-                            "name": "hnsw",
-                            "space_type": "l2"
-                          }
-                        }
-                      }
-                    }
-                  }
-                }
-              aws:
-                # REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
-                sts_role_arn: "{ROLE}"
-                # REQUIRED: Provide the region
-                region: "{REGION}"
-    ```
- 1. Under Network, select "Public access", then click "Next".
-
-    ![Create pipeline](/static/images/ddb-os-zetl14.jpg)
-
- 1. Click "Create pipeline".
+Please follow the steps to setup zero-ETL. Here we use the AWS Console instead of Curl commands:
+
+ 1. Open [OpenSearch Service](https://us-west-2.console.aws.amazon.com/aos/home?region=us-west-2#opensearch) within the Console
+
+ 2. Select **Pipelines** from the left pane and click on **"Create pipeline"**. 
+![Create pipeline](/static/images/ddb-os-zetl13.jpg) 
+
+ 3. Select **"Blank"** from the Ingestion pipeline blueprints.
+![BluePrint Selection](/static/images/CreatePipeline.png)
+
+ 4. Configure the source by selecting the source as **"Amazon DynamoDB"** and fill the details as below. Once done, click "Next"
+![Configure source](/static/images/configure_source.png)
+
+ 5. Skip the **Processor** configuration
+
+![Skip processor](/static/images/processor_blank.png) 
+
+ 6. Configure the sink by filling up the Opensearch details as below:
+![Configure Sink](/static/images/configure_sink.png)
+
+ 7. Use the following content under **Schema mapping**:
+
+```yaml
+{
+    "template": {
+      "settings": {
+        "index.knn": true,
+        "default_pipeline": "product-en-nlp-ingest-pipeline"
+      },
+      "mappings": {
+        "properties": {
+          "ProductID": {
+            "type": "keyword"
+          },
+          "ProductName": {
+            "type": "text"
+          },
+          "Category": {
+            "type": "text"
+          },
+          "Description": {
+            "type": "text"
+          },
+          "Image": {
+            "type": "text"
+          },
+          "combined_field": {
+            "type": "text"
+          },
+          "product_embedding": {
+            "type": "knn_vector",
+            "dimension": 1536
+          }
+        }
+      }
+    }
+}
+```
+
+Once done, click on **"Next"**
+
+ 8. Configure pipeline and then click "Next".
+
+    ![Configure pipeline](/static/images/ddb-os-zetl14.jpg) 
+
+
+ 9. Click "Create pipeline".
 
     ![Create pipeline](/static/images/ddb-os-zetl15.jpg)
 
- 1. **Wait until the pipeline has finished creating**. This will take 5 minutes or more.
+ 10. **Wait until the pipeline has finished creating and status is "Active"**. This will take 5 minutes or more.
 
 
- After the pipeline is created, it will take some additional time for the initial export from DynamoDB and import into OpenSearch Service. After you have waited several more minutes, you can check if items have replicated into OpenSearch by making a query in Dev Tools in the OpenSearch Dashboards.
+ After the pipeline is created, it will take some additional time for the initial export from DynamoDB and import into OpenSearch Service. After you have waited several more minutes, you can check if items have replicated into OpenSearch by making a query using the OpenSearch Dashboards feature called Dev Tools.
 
- To open Dev Tools, click on the menu in the top left of OpenSearch Dashboards, scroll down to the `Management` section, then click on `Dev Tools`. Enter the following query in the left pane, then click the "play" arrow.
+- To open Dev Tools, click on the menu in the top left of OpenSearch Dashboards, scroll down to the `Management` section, then click on `Dev Tools`. 
+	
+	![Devtools](/static/images/Devtools.png)
+
+- Enter the following query in the left pane, then click the "play" arrow to execute it.
 
 ```text
 GET /product-details-index-en/_search
 ```
-You may encounter a few types of results:
-- If you see a 404 error of type *index_not_found_exception*, then you need to wait until the pipeline is `Active`. Once it is, this exception will go away.
-- If your query does not have results, wait a few more minutes for the initial replication to finish and try again.
+
+- The output will the list of documents that have all the fields mentioned under the zero-ETL pipeline mapping.
+
+ You may encounter a few types of results:
+ - If you see a 404 error of type *index_not_found_exception*, then you need to wait until the pipeline is `Active`. Once it is, this exception will go away.
+ - If your query does not have results, wait a few more minutes for the initial replication to finish and try again.
 
 ![Create pipeline](/static/images/ddb-os-zetl16.jpg)