Clickstream Analytics on AWS Guidance

Overview

This solution collects, ingests, analyzes, and visualizes clickstream events from your websites and mobile applications. Clickstream data is critical for online business analytics use cases, such as user behavior analysis, customer data platform, and marketing analysis. This data derives insights into the patterns of user interactions on a website or application, helping businesses understand user navigation, preferences, and engagement levels to drive product innovation and optimize marketing investments.

With this solution, you can quickly configure and deploy a data pipeline that fits your business and technical needs. It provides purpose-built software development kits (SDKs) that automatically collect common events and easy-to-use APIs to report custom events, enabling you to easily send your customers’ clickstream data to the data pipeline in your AWS account. The solution also offers pre-assembled dashboards that visualize key metrics about user lifecycle, including acquisition, engagement, activity, and retention, and adds visibility into user devices and geographies. You can combine user behavior data with business backend data to create a comprehensive data platform and generate insights that drive business growth.

For more information, refer to the doc

Architecture Overview

Amazon CloudFront distributes the frontend web UI assets hosted in the Amazon S3 bucket, and the backend APIs hosted with Amazon API Gateway and AWS Lambda.
The Amazon Cognito user pool or OpenID Connect (OIDC) is used for authentication.
The web UI console uses Amazon DynamoDB to store persistent data.
AWS Step Functions, AWS CloudFormation, AWS Lambda, and Amazon EventBridge are used for orchestrating the lifecycle management of data pipelines.
The data pipeline is provisioned in the Region specified by the system operator. It consists of Application Load Balancer (ALB), Amazon ECS, Amazon Managed Streaming for Kafka (Amazon MSK), Amazon Kinesis Data Streams, Amazon S3, Amazon EMR Serverless, Amazon Redshift, and Amazon QuickSight.

For more information, refer to the doc.

Cost

The Clickstream Analytics costs are primarily driven by the data pipeline, with following main components:

Ingestion module, cost varies based on ingestion server size and selected data sink type
Data processing and modeling module (optional), cost determined by module activation and configuration settings
Enabled Dashboards (optional), cost based on module activation and selected configuration options
Additional features

Cost estimates are provided for various data throughput levels (10, 100, 1,000, and 10,000 requests per second) across different pipeline configurations. Details are shown in Cost section.

Prerequisites

At least four vacant S3 buckets.

Operating System

These deployment instructions are optimized to best work on macOS, Linux, or Windows. The following packages and tools are required:

AWS Command Line Interface
Python 3.11 or newer
Pypi/Pip 25.0 or newer
Node.js 20.12.0 or newer
AWS CDK 2.140.0 or newer
pnpm 9.15.3
Docker
AWS access key ID and secret access key configured in your environment with AdministratorAccess equivalent permissions

AWS account requirements

You need an AWS account with AdministratorAccess equivalent permissions to deploy this solution.

aws cdk bootstrap

This Guidance uses aws-cdk. If you are using aws-cdk for the first time, please perform the bootstrapping:

cdk bootstrap --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess

Supported Regions

Clickstream Analytics uses services which may not be currently available in all AWS Regions. Launch this solution in an AWS Region where required services are available.

Supported AWS Regions

Deployment Steps

Clone the repository:

git clone https://github.com/aws-solutions/clickstream-analytics-on-aws.git
cd clickstream-analytics-on-aws

Install pnpm and dependencies:

npm install -g [email protected]
pnpm install && pnpm projen && pnpm nx build @aws/clickstream-base-lib

Bootstrap CDK (if not done before):
```
npx cdk bootstrap
```

Deploy the stack:

To deploy code from your local machine to your AWS account, follow the deployment instructions in README.md.

cd deployment
sh solution-deploy.sh --region <AWS Region> --profile <AWS Profile Name> --email <User Email> --template-deploy

Note the CloudFront URL from the outputs to access the web console.

Deployment Validation

After deploying the solution, you can validate the deployment by:

Open the AWS CloudFormation console and verify the status of the stack is CREATE_COMPLETE.
Navigate to the CloudFront URL provided in the CloudFormation outputs to access the web console.
Sign in with the credentials sent to the email address you provided during deployment.
Verify you can access the dashboard and create a new data pipeline.
For CDK deployment, you can validate by checking the CloudFormation stacks in the console or by running:
```
aws cloudformation describe-stacks --stack-name cloudfront-s3-control-plane-stack-global
```

Running the Guidance

After successful deployment, you can start using the Clickstream Analytics solution by following these steps:

Access the Web Console:
- Navigate to the CloudFront URL provided in the CloudFormation outputs
- Sign in with the credentials sent to the email address you provided during deployment
Create a Data Pipeline:
- In the web console, navigate to the "Pipelines" section
- Click "Create pipeline"
- Follow the wizard to configure your data pipeline based on your requirements
- Choose the appropriate sink (S3, Kinesis, or MSK)
- Configure data processing options if needed
Create an App:
- After creating a pipeline, navigate to the "Apps" section
- Click "Create app"
- Configure your app settings and associate it with the created pipeline
- Note the app ID and write key for SDK integration
Integrate SDK with Your Application:
- Choose the appropriate SDK for your platform (Android, iOS, Web, Flutter, etc.)
- Follow the SDK integration guide to implement the SDK in your application
- Configure the SDK with the app ID and endpoint URL
View Analytics:
- Once data starts flowing, you can view analytics in the pre-built dashboards
- Access user lifecycle metrics, engagement data, and other insights

For step-by-step instructions on building a serverless data pipeline that collects application data, please refer to the implementation guide.

Next Steps

After successfully deploying and running the Clickstream Analytics, consider these next steps to enhance your implementation:

Customize Data Collection:
- Implement custom events in your applications to capture specific user interactions
- Use the SDK's API to send custom attributes with events
Enhance Data Processing:
- Configure data transformation rules in the ETL process
- Implement custom plugins for data enrichment
Integrate with Other AWS Services:
- Connect your clickstream data with Amazon Personalize for recommendation engines
- Use Amazon SageMaker for advanced analytics and machine learning on your clickstream data
Scale Your Implementation:
- Monitor performance metrics and adjust the pipeline configuration as your traffic grows
- Consider implementing multi-region deployments for global applications
Develop Custom Dashboards:
- Create custom QuickSight dashboards for specific business needs
- Integrate with your existing business intelligence tools
Implement Data Governance:
- Set up data retention policies
- Implement data privacy controls in accordance with regulations like GDPR or CCPA

For more information, refer to the Pipeline Management section in the implementation guide.

Cleanup

To clean up all resources deployed by this solution, follow these steps:

Delete Data Pipelines First:
- In the web console, navigate to the "Pipelines" section
- Select each pipeline and click "Delete"
- Wait for all pipeline deletions to complete before proceeding
Delete Apps:
- In the web console, navigate to the "Apps" section
- Delete all apps you've created

Delete CloudFormation Stacks:

aws cloudformation delete-stack --stack-name <stack-name>

Delete S3 Buckets (optional)

Test

pnpm test

Local development for web console

Step1: Deploy the solution control plane(create DynamoDB tables, State Machine and other resources).
Step2: Open Amazon Cognito console, select the corresponding User pool, click the App integration tab, select application details in the App client list, edit Hosted UI, and set a new URL: http://localhost:3000/signin into Allowed callback URLs.
Step3: Goto the folder: src/control-plane/local

cd src/control-plane/local

# run backend server local
bash start.sh -s backend

# run frontend server local
bash start.sh -s frontend

Local build spark ETL jar

Step1: Build ETL common

cd src/data-pipeline/etl-common 
./gradlew clean build install

Step2: Build spark ETL jar

cd src/data-pipeline/spark-etl

# build with unit tests
./gradlew clean build 

# or only build jar and skip all unit tests 
./gradlew clean build -x test -x :coverageCheck

# check the jar file
ls -l ./build/libs/spark-etl-*.jar

FAQ, known issues, additional considerations, and limitations

When deploying in regions with limited availability for some AWS services, you may encounter deployment failures. Always check the supported regions documentation before deployment.
Large data volumes may require adjustments to the default configuration settings for optimal performance.
The first-time ETL job may take longer to complete as it initializes the processing environment.
For detailed troubleshooting information, refer to the troubleshooting guide.

Revisions

Check the CHANGELOG.md file in the repo to see all notable changes and updates to the software. The changelog provides a clear record of improvements and fixes for each version.

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Name		Name	Last commit message	Last commit date
Latest commit History 1,365 Commits
.github		.github
.gitlab		.gitlab
.projen		.projen
deployment		deployment
docs		docs
examples		examples
frontend		frontend
projenrc		projenrc
scripts		scripts
src		src
test		test
.cfnnag_global_suppress_list		.cfnnag_global_suppress_list
.dockerignore		.dockerignore
.eslintrc.json		.eslintrc.json
.gitallowed		.gitallowed
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.mergify.yml		.mergify.yml
.npmignore		.npmignore
.npmrc		.npmrc
.projenrc.js		.projenrc.js
.semgrepignore		.semgrepignore
.viperlightignore		.viperlightignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Config		Config
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
SECURITY.md		SECURITY.md
buildspec.yml		buildspec.yml
cdk.json		cdk.json
codescan-prebuild-custom.sh		codescan-prebuild-custom.sh
e2e-deploy.sh		e2e-deploy.sh
nx.json		nx.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
solution-manifest.yaml		solution-manifest.yaml
sonar-project.properties		sonar-project.properties
tsconfig.dev.json		tsconfig.dev.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clickstream Analytics on AWS Guidance

Table of Contents

Overview

Architecture Overview

Cost

Prerequisites

Operating System

AWS account requirements

aws cdk bootstrap

Supported Regions

Deployment Steps

Deployment Validation

Running the Guidance

Next Steps

Cleanup

Test

Local development for web console

Local build spark ETL jar

FAQ, known issues, additional considerations, and limitations

Revisions

Notices

About

Uh oh!

Releases 19

Uh oh!

Contributors 20

Uh oh!

Languages

License

aws-solutions-library-samples/guidance-for-clickstream-analytics-on-aws

Folders and files

Latest commit

History

Repository files navigation

Clickstream Analytics on AWS Guidance

Table of Contents

Overview

Architecture Overview

Cost

Prerequisites

Operating System

AWS account requirements

aws cdk bootstrap

Supported Regions

Deployment Steps

Deployment Validation

Running the Guidance

Next Steps

Cleanup

Test

Local development for web console

Local build spark ETL jar

FAQ, known issues, additional considerations, and limitations

Revisions

Notices

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 19

Uh oh!

Contributors 20

Uh oh!

Languages