Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,19 @@ cython_debug/
marimo/_static/
marimo/_lsp/
__marimo__/


# Terraform / OpenTofu
.terraform/
.terraform.lock.hcl
*.tfstate
*.tfstate.*
*.plan
crash.log
# Vars
*.tfvars
*.tfvars.json
*.auto.tfvars
# Keys
*.pem
oci_api_key.pem
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,14 @@
# lake-bazaar
e-commerce 플랫폼의 주문 및 상품 데이터를 Ingestion, Transform, Load, Serving 하여 Business Insight 도출을 도와주는 Data Lake House Platform







# wiki

* [주요 분석 지표 분석](https://github.com/f-lab-edu/lake-bazaar/wiki/analysis-indicators)
* [원천 데이터 구조 분석 및 처리 전략 수립](https://github.com/f-lab-edu/lake-bazaar/wiki/raw-data-schema-analysis-and-processing-strategy)
* [Medallion 아키텍처(Bronze → Silver → Gold) 설계](https://github.com/f-lab-edu/lake-bazaar/wiki/Medallion-Architecture)
Binary file added docs/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions docs/network_firewall_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# 네트워크/방화벽 템플릿 (VPC/FW) 설계

## 목적
- 5노드 클러스터(Hadoop, Hive, Spark, Zookeeper 등) 운영에 필요한 최소 포트만 오픈하는 보안 정책 설계
- 불필요한 외부 접근 차단, 서비스별 필수 포트만 허용

## 1. 필수 오픈 포트 정의

| 서비스 | 포트 | 설명 |
| -------------- | ------- | ------------------------------------ |
| SSH | 22 | 운영/관리용 SSH 접속 |
| Zookeeper | 2181 | 클러스터 coordination |
| Zookeeper | 2888 | Follower <-> Leader 통신 |
| Zookeeper | 3888 | Follower 선출 |
| Hadoop NN | 9870 | NameNode Web UI |
| Hadoop DN | 9864 | DataNode Web UI |
| YARN RM | 8088 | ResourceManager Web UI |
| YARN NM | 8042 | NodeManager Web UI |
| Spark History | 18080 | Spark History Server |
| MapReduce JHS | 10020 | JobHistory Server |
| HiveServer2 | 10000 | Hive JDBC |
| Hive Metastore | 9083 | Hive Metastore |
| Airflow | 8080 | Airflow Web UI |
| MySQL | 3306 | Hive Metastore backend |

> 참고: 실제 오픈 포트는 배포 환경/보안 정책에 따라 조정 가능

## 2. 네트워크/방화벽 설계 방향
- 외부(인터넷) → SSH(22)만 허용 (운영자 IP로 제한 권장)
- 클러스터 내부 통신(서브넷 내): 위 필수 포트만 허용
- 나머지 모든 인바운드 트래픽 차단
40 changes: 40 additions & 0 deletions docs/oci_cluster_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# OCI Always Free 클러스터 구성 설계

## 목적
- Oracle Cloud Always Free 한도 내에서 5노드 클러스터(Hadoop, Hive, Spark, Airflow 등) 구축
- 각 노드 역할 및 인스턴스 타입, 한도/자원 설계 명확화

---

## 1. 클러스터 노드 및 인스턴스 타입

| 노드 | 역할 요약 | 인스턴스 타입 | CPU/메모리 |
|------|-----------|--------------|------------|
| M1 | NameNode, ResourceManager, ZK, JournalNode, ZKFC | A1 Flex | 1 OCPU / 6GB RAM |
| M2 | NameNode, ResourceManager, ZK, JournalNode, ZKFC | A1 Flex | 1 OCPU / 6GB RAM |
| C1 | ZK, JournalNode, JobHistory, Airflow, Spark History, HAProxy, DataNode, NodeManager | A1 Flex | 1 OCPU / 6GB RAM |
| D1 | HiveServer2, Spark Worker, DataNode, NodeManager | E2 Micro | 1/8 OCPU / 1GB RAM |
| D2 | Hive Metastore, Spark Worker, DataNode, NodeManager | E2 Micro | 1/8 OCPU / 1GB RAM |

---

## 2. OCI Always Free 한도
- **A1 Flex:** 최대 4 OCPU, 24GB RAM (이번 설계는 3 OCPU, 18GB RAM 사용)
- **E2 Micro:** 최대 2대 (각 1/8 OCPU, 1GB RAM)
- **부트 볼륨:** 인스턴스당 50GB, 총 200GB 한도 내에서 5대 배포 가능
- **VCN(네트워크):** 최대 2개
- **공인 IP:** 필요 노드에만 할당 가능

---

## 3. 네트워크/방화벽 정책
- 외부(인터넷): SSH(22)만 허용, 운영자 IP로 제한 권장
- 내부(서브넷): 서비스별 필수 포트만 허용(2181, 2888, 3888, 9870, 9864, 8088, 8042, 18080, 10020, 10000, 9083, 8080, 3306 등)

---

## 4. 이미지 및 OS
- 모든 인스턴스는 Always Free Eligible 이미지(Oracle Linux, Ubuntu 등) 사용
- ARM(A1 Flex)과 x86(E2 Micro) 혼합 환경, 대부분의 오픈소스 빅데이터/분산 시스템은 ARM 지원

---
15 changes: 15 additions & 0 deletions docs/project_cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## 5-Node Cluster Topology (Hadoop · Hive · Spark · Airflow)

> 본 프로젝트는 **5대 구성**으로 운영합니다.(추후 무료인스턴스에 따라 바뀔수 있음.)

<details>
<summary>Nodes & Roles</summary>

- **M1** — Apache ZooKeeper, Apache Hadoop JournalNode, Apache Hadoop NameNode, YARN ResourceManager, ZKFailoverController
- **M2** — Apache ZooKeeper, Apache Hadoop JournalNode, Apache Hadoop NameNode, YARN ResourceManager, ZKFailoverController
- **C1** — Apache ZooKeeper, Apache Hadoop JournalNode, MapReduce JobHistory Server, Apache Airflow, Apache Spark History Server, HAProxy, HDFS DataNode, YARN NodeManager
- **D1** — Apache HiveServer2, Apache Spark Worker, HDFS DataNode, YARN NodeManager
- **D2** — Apache Hive Metastore (mysqlDB backend), Apache Spark Worker, HDFS DataNode, YARN NodeManager
</details>

![alt text](image.png)
84 changes: 84 additions & 0 deletions infra/tofu/envs/prod/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
provider "oci" {
region = var.region # 기본값 ap-chuncheon-1
config_file_profile = "DEFAULT" # ~/.oci/config 의 섹션명
}

module "network" {
source = "../../modules/network"
compartment_id = var.compartment_ocid
}

locals {
ssh_key = chomp(file(var.ssh_public_key_abs_path))
}



# M1: NameNode, ResourceManager, ZK, JournalNode, ZKFC (A1 Flex)
module "compute_m1" {
source = "../../modules/compute_instances"
tenancy_ocid = var.tenancy_ocid
compartment_id = var.compartment_ocid
subnet_id = module.network.public_subnet_id
shape = "VM.Standard.A1.Flex"
ocpus = 1
memory_in_gbs = 1
display_name = "M1"
hostname_label = "m1"
ssh_authorized_keys = local.ssh_key
}

# M2: NameNode, ResourceManager, ZK, JournalNode, ZKFC (A1 Flex)
module "compute_m2" {
source = "../../modules/compute_instances"
tenancy_ocid = var.tenancy_ocid
compartment_id = var.compartment_ocid
subnet_id = module.network.public_subnet_id
shape = "VM.Standard.A1.Flex"
ocpus = 1
memory_in_gbs = 1
display_name = "M2"
hostname_label = "m2"
ssh_authorized_keys = local.ssh_key
}

# C1: ZK, JournalNode, JobHistory, Airflow, Spark History, HAProxy, DataNode, NodeManager (A1 Flex)
# module "compute_c1" {
# source = "../../modules/compute_instances"
# tenancy_ocid = var.tenancy_ocid
# compartment_id = var.compartment_ocid
# subnet_id = module.network.public_subnet_id
# shape = "VM.Standard.A1.Flex"
# ocpus = 1
# memory_in_gbs = 1
# display_name = "C1"
# hostname_label = "c1"
# ssh_authorized_keys = local.ssh_key
# }

# D1: HiveServer2, Spark Worker, DataNode, NodeManager (E2 Micro)
module "compute_d1" {
source = "../../modules/compute_instances"
tenancy_ocid = var.tenancy_ocid
compartment_id = var.compartment_ocid
subnet_id = module.network.public_subnet_id
shape = "VM.Standard.E2.1.Micro"
memory_in_gbs = 1
display_name = "D1"
hostname_label = "d1"
ssh_authorized_keys = local.ssh_key
}

# D2: Hive Metastore, Spark Worker, DataNode, NodeManager (E2 Micro)
module "compute_d2" {
source = "../../modules/compute_instances"
tenancy_ocid = var.tenancy_ocid
compartment_id = var.compartment_ocid
subnet_id = module.network.public_subnet_id
shape = "VM.Standard.E2.1.Micro"
memory_in_gbs = 1
display_name = "D2"
hostname_label = "d2"
ssh_authorized_keys = local.ssh_key
}

16 changes: 16 additions & 0 deletions infra/tofu/envs/prod/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

output "m1_public_ip" {
value = module.compute_m1.public_ip
}
output "m2_public_ip" {
value = module.compute_m2.public_ip
}
# output "c1_public_ip" {
# value = module.compute_c1.public_ip
# }
output "d1_public_ip" {
value = module.compute_d1.public_ip
}
output "d2_public_ip" {
value = module.compute_d2.public_ip
}
11 changes: 11 additions & 0 deletions infra/tofu/envs/prod/terraform.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaa4aucoyxgqh3n4mlbwy7f32ua436e45wczoa3ja3blrgtzvqm2gaa"
compartment_ocid = "ocid1.tenancy.oc1..aaaaaaaa4aucoyxgqh3n4mlbwy7f32ua436e45wczoa3ja3blrgtzvqm2gaa"
ssh_public_key_abs_path = "/Users/chanyong/.ssh/id_rsa.pub" # 반드시 '절대경로i'

# 기본: 무료 마이크로
shape = "VM.Standard.E2.1.Micro"

# (E2 Micro 자원 부족하면 아래로 전환)
# shape = "VM.Standard.A1.Flex"
# a1flex_ocpus = 1
# a1flex_memory_gb = 6
34 changes: 34 additions & 0 deletions infra/tofu/envs/prod/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
variable "region" {
type = string
default = "ap-chuncheon-1"
}

variable "compartment_ocid" {
type = string
}

variable "tenancy_ocid" {
type = string
}

variable "ssh_public_key_abs_path" {
description = "SSH 공개키 절대경로 (예: /home/you/.ssh/id_ed25519.pub)"
type = string
}

# 무료 우선: E2 Micro. 부족하면 A1 Flex로 전환.
variable "shape" {
type = string
default = "VM.Standard.E2.1.Micro"
}

# A1 Flex일 때만 사용 (E2 Micro이면 무시)
variable "a1flex_ocpus" {
type = number
default = 1
}

variable "a1flex_memory_gb" {
type = number
default = 6
}
9 changes: 9 additions & 0 deletions infra/tofu/envs/prod/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
terraform {
required_version = ">= 1.6.0"
required_providers {
oci = {
source = "oracle/oci"
version = "~> 6.0"
}
}
}
59 changes: 59 additions & 0 deletions infra/tofu/modules/compute_instances/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# 가용도메인 조회 (춘천 AD 1개)
data "oci_identity_availability_domains" "ads" {
compartment_id = var.tenancy_ocid
}

locals {
ad_name = data.oci_identity_availability_domains.ads.availability_domains[0].name
}

# 최신 Ubuntu 이미지
data "oci_core_images" "ubuntu" {
compartment_id = var.compartment_id
operating_system = var.os
operating_system_version = var.os_version
shape = var.shape
sort_by = "TIMECREATED"
sort_order = "DESC"
}

resource "oci_core_instance" "vm" {
availability_domain = local.ad_name
compartment_id = var.compartment_id
display_name = var.display_name
shape = var.shape

# A1 Flex일 때만 적용
dynamic "shape_config" {
for_each = var.shape == "VM.Standard.A1.Flex" ? [1] : []
content {
ocpus = var.ocpus
memory_in_gbs = var.memory_in_gbs
}
}

create_vnic_details {
subnet_id = var.subnet_id
assign_public_ip = true
hostname_label = var.hostname_label
}

source_details {
source_type = "image"
source_id = data.oci_core_images.ubuntu.images[0].id
}

metadata = {
ssh_authorized_keys = var.ssh_authorized_keys
}
}

# 공인 IP
data "oci_core_vnic_attachments" "va" {
compartment_id = var.compartment_id
instance_id = oci_core_instance.vm.id
}

data "oci_core_vnic" "primary" {
vnic_id = data.oci_core_vnic_attachments.va.vnic_attachments[0].vnic_id
}
2 changes: 2 additions & 0 deletions infra/tofu/modules/compute_instances/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
output "public_ip" { value = data.oci_core_vnic.primary.public_ip_address }
output "private_ip" { value = data.oci_core_vnic.primary.private_ip_address }
50 changes: 50 additions & 0 deletions infra/tofu/modules/compute_instances/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
variable "compartment_id" {
type = string
}

variable "subnet_id" {
type = string
}

variable "shape" {
type = string
default = "VM.Standard.E2.1.Micro"
}

variable "tenancy_ocid" {
type = string
}

variable "ocpus" {
type = number
default = 1
}

variable "memory_in_gbs" {
type = number
default = 6
}

variable "display_name" {
type = string
default = "dev-vm-1"
}

variable "hostname_label" {
type = string
default = "devvm1"
}

variable "ssh_authorized_keys" {
type = string
}

variable "os" {
type = string
default = "Canonical Ubuntu"
}

variable "os_version" {
type = string
default = "22.04"
}
Loading
Loading