Formatted as: Requirement: My solution
- Problem description: Overview section
- Cloud: Google Cloud Platform, site link GCP
- Data ingestion:
- Data warehouse: GCP BigQuery
- Transformations: Data Build Tool (DBT)
- Dashboard: Looker Studio
- Reproducibility: Instructions start at this section
- Tests:
- Added to
start-dagster.sh - Included in the
start-airbyte.shfrom the source template
- Added to
- Instructions completed in VS Code, ability to convert to your IDE if different
- Access to a bash command line
- Basic familiarity with GitHub
- Ability to access cloud service providers
- A computer capable of running applications in this project
- For grader reproducibility, some things were hardcoded that normally would not be. Host names, secrets, etc. The best practice would be to define these either in environment variables or in an encrypted manner.
- Operational applications are scripts to simulate real app output.
- A single service account was created, in production there would be a service account for each service.
- Batch processing is run manually for demonstration, this would be put on a schedule in production.
- Markdown is written for GitHub Markdown, as such it may render better if read on GitHub.
Oaken Spirits is an alcohol distributor that has gained popularity from it's private selection of whiskey. Its popularity has spurred growth and the company has recently signed a deal to expand at a national level with several large vendors. Currently, the applications supporting the business are for Iowa sales only and the CEO is concerned that they will not support future growth and would like a system that can handle national-level sales.
Oaken Spirit management does not want to spend money on more modern, but expensive, applications. You have been directed to work with the vendors to find a solution. The sales application stores data locally and does not integrate with external databases. Working with the vendors we can get a JSON message from each application. The shipping and accounting applications can integrate with an external database.
- Not scalable, some manual data entry and transfers. This could cause delays or errors, more so as we scale nationally. The current processes may not be practical.
- Does not integrate or deliver real-time updates between the applications. This can lead to delays in shipping and accounting.
- Has data in multiple locations; sometimes duplicated. Lack of integration and centralization has led to departments such as sales and shipping recording duplicate data.
- Lacks an analytics solution. There are no options for leadership to make data-informed decisions.
- Desire integration options that allows adding or replacing applications. As the business grows and technology improves, new applications may be added and current applications replaced.
- Create a single database as the source of truth.
- Create a data pipeline that integrates the systems and provides real-time updates.
- Ensure the system is scalable.
- Provide an analytics solution.
- Clone the repository locally
- Requirements
- Choose Local
- Cloud Analytics
- Clean up
- On-site data center: LOCAL_DOCKER.md, streaming data pipeline for business services.
or
- Full Cloud - not available yet
- Run analytics services: see ANALYTICS_PIPELINE.md, batch data pipeline for analytics.
- Dashboard: DASHBOARD.md



