diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index 5de9ae27..00000000 Binary files a/.DS_Store and /dev/null differ diff --git a/dockerfiles/.DS_Store b/dockerfiles/.DS_Store deleted file mode 100644 index eca036a3..00000000 Binary files a/dockerfiles/.DS_Store and /dev/null differ diff --git a/dockerfiles/blogging-application/assets/.DS_Store b/dockerfiles/blogging-application/assets/.DS_Store deleted file mode 100644 index f318299a..00000000 Binary files a/dockerfiles/blogging-application/assets/.DS_Store and /dev/null differ diff --git a/dockerfiles/blogging-application/assets/img/portfolio/.DS_Store b/dockerfiles/blogging-application/assets/img/portfolio/.DS_Store deleted file mode 100644 index 0f42f1df..00000000 Binary files a/dockerfiles/blogging-application/assets/img/portfolio/.DS_Store and /dev/null differ diff --git a/dockerfiles/blogging-application/assets/vendor/.DS_Store b/dockerfiles/blogging-application/assets/vendor/.DS_Store deleted file mode 100644 index 9986f7bf..00000000 Binary files a/dockerfiles/blogging-application/assets/vendor/.DS_Store and /dev/null differ diff --git a/docs/monitoring/alertmanager.md b/docs/monitoring/alertmanager.md index 7251eeb8..0141154d 100644 --- a/docs/monitoring/alertmanager.md +++ b/docs/monitoring/alertmanager.md @@ -5,85 +5,241 @@ sidebar_id: "alertmanager" sidebar_position: 3 --- -# Alertmanager: Managing Alerts in Kubernetes +# ⎈ A Hands-On Guide: Setting Up Prometheus and AlertManager in Kubernetes with Custom Alerts 🛠️ -Alertmanager is a component of the Prometheus ecosystem that handles alerts generated by Prometheus. It routes, deduplicates, and manages alerts, ensuring that critical issues are brought to the attention of the right people. This guide provides an overview of Alertmanager, its benefits, and how to set it up in a Kubernetes cluster. +#### *⇢ Understanding Prometheus & AlertManager Setup in Kubernetes with Custom Rules: A Comprehensive Guide* ---- -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about Alertmanager setup and usage in Kubernetes.

-
---- +![img](./img/alertmanager.png.webp) -## Table of Contents -- [Introduction](#introduction) -- [Why Use Alertmanager?](#why-use-alertmanager) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Best Practices](#best-practices) +Monitoring your Kubernetes cluster is crucial for maintaining the health and performance of your applications. In this guide, we’ll walk through setting up Prometheus and Alertmanager using Helm and configuring custom alert rules to monitor your cluster effectively. +If you haven’t already, I recommend checking out my previous blog post on Kubernetes monitoring using Prometheus and Grafana for a comprehensive overview of setting up Prometheus and Grafana. ---- +### Prerequisites -## Introduction -Alertmanager is a critical component for managing alerts in Kubernetes. It works with Prometheus to ensure that alerts are routed to the appropriate channels, such as email, Slack, or PagerDuty, and provides mechanisms for silencing and grouping alerts. +Before we start, ensure you have the following: ---- +- A running Kubernetes cluster. +- Helm installed on your local machine. -## Why Use Alertmanager? -- **Centralized Alert Management**: Consolidates alerts from multiple Prometheus instances. -- **Routing and Notification**: Sends alerts to the right people or systems based on defined rules. -- **Deduplication**: Prevents duplicate alerts from overwhelming notification channels. -- **Silencing**: Temporarily suppresses alerts during maintenance or known issues. +![img](./img/custom-alerts.png.gif) ---- +### Step 1: Install Prometheus and Alertmanager using Helm -## Architecture -Alertmanager works as follows: -1. **Prometheus**: Generates alerts based on defined rules. -2. **Alertmanager**: Receives alerts from Prometheus and processes them. -3. **Notification Channels**: Sends alerts to configured channels like email, Slack, or PagerDuty. +We’ll use the kube-prometheus-stack Helm chart from the Prometheus community. This chart includes Prometheus, Alertmanager, and Grafana, along with several pre-configured dashboards and alerting rules. ---- -## Installation -> **Note:** Detailed installation steps will be added soon. +First, create a custom-values.yaml file to specify our custom configurations: + +```yaml +# custom-values.yaml +prometheus: + service: + type: NodePort +grafana: + service: + type: NodePort +alertmanager: + service: + type: NodePort +``` +Next, install the kube-prometheus-stack using Helm: + +```yaml +helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f custom-values.yaml +``` +This command will deploy Prometheus, Alertmanager, and Grafana to your cluster with the services exposed as NodePort. + +![img](./img/alert-manager-architecture.png.webp) + +### Step 2: Verifying the Setup +To verify that Prometheus and Alertmanager are running correctly, you can access their web UIs. Since we exposed their services as NodePort, you can use kubectl port-forward to access them locally or you can use external IP of the cluster and nodeport of the respective service. + +For Prometheus: + +![img](./img/prometheus-ui.png.webp) + + +For Alertmanager: + +![img](./img/alertmanager-ui.png.webp) + +For Grafana: + +![img](./img/grafana-dashboard.png.webp) + +Access the default Alertmanager rules: +To access the alertmanager rules/alerts, navigate to Alerts section on prometheus UI: + + +![img](./img/alerts-in-prometheus-ui.png.webp) + + +Here we can see that three alerts are in Firing state, so these alerts we can see in AlertManager UI to manage: + +![img](./img/alerts-fired.png.webp) + +### Step 3: Configuring Custom Alert Rules +From the above steps we can see that the default alerts are configured in prometheus and alertmanager. Now, let’s add custom alert rules to monitor our Kubernetes cluster. We’ll create a PrometheusRule manifest to define these alerts. + +Create a file named custom-alert-rules.yaml with the following content: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + labels: + app: kube-prometheus-stack + app.kubernetes.io/instance: kube-prometheus-stack + release: kube-prometheus-stack + name: kube-pod-not-ready +spec: + groups: + - name: my-pod-demo-rules + rules: + - alert: KubernetesPodNotHealthy + expr: sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0 + for: 1m + labels: + severity: critical + annotations: + summary: Kubernetes Pod not healthy (instance {{ $labels.instance }}) + description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: KubernetesDaemonsetRolloutStuck + expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0 + for: 10m + labels: + severity: warning + annotations: + summary: Kubernetes DaemonSet rollout stuck (instance {{ $labels.instance }}) + description: "Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled or not ready\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: ContainerHighCpuUtilization + expr: (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, container) / sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""}) by (pod, container) * 100) > 80 + for: 2m + labels: + severity: warning + annotations: + summary: Container High CPU utilization (instance {{ $labels.instance }}) + description: "Container CPU utilization is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: ContainerHighMemoryUsage + expr: (sum(container_memory_working_set_bytes{name!=""}) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) > 80 + for: 2m + labels: + severity: warning + annotations: + summary: Container High Memory usage (instance {{ $labels.instance }}) + description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: KubernetesContainerOomKiller + expr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1 + for: 0m + labels: + severity: warning + annotations: + summary: Kubernetes Container oom killer (instance {{ $labels.instance }}) + description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: KubernetesPodCrashLooping + expr: increase(kube_pod_container_status_restarts_total[1m]) > 3 + for: 2m + labels: + severity: warning + annotations: + summary: Kubernetes pod crash looping (instance {{ $labels.instance }}) + description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" +``` +Apply the manifest to your Kubernetes cluster: + +```yaml +kubectl apply -f custom-alert-rules.yaml +``` +Once the PromethuesRule is created then check the newly created alerts on Prometheus UI. + +![img](./img/promethues-rule.png.webp) + +That’s it we have successfully added our new custom alerts on alertmanager. + +### Step 4: Test the custom rules: +To ensure our custom alert rules are working correctly, we’ll simulate a failure by creating a pod with an incorrect image tag. This will help us verify if the alerts are triggered and properly reported in Alertmanager. KubernetesPodNotHealthy alert is responsible to report this alert. + + +1. **Create a Pod with an Invalid Image** + +This will simulate a failure by using an incorrect image tag: + +```yaml +kubectl run nginx-pod --image=nginx:lates3 +``` +Note: The correct tag is latest, so lates3 is intentionally incorrect to cause the pod to fail. + +2. **Verify the Pod Status** + +Check the status of the pod to confirm that it is failing: + +```yaml +kubectl get pods nginx-pod +NAME READY STATUS RESTARTS AGE +nginx-pod 0/1 ImagePullBackOff 0 5m35s +``` + +You should see the pod in a ErrImagePull state. You can also describe the pod for more details ---- -## Configuration -Alertmanager configuration involves defining alert routing, grouping, and notification channels. Example configuration: +```yaml +kubectl describe pod nginx-pod +``` +This will provide information about why the pod is failing. + +3. **Check for Alerts in Alertmanager** + +Since you have set up custom alert rules, these should trigger an alert when the pod fails. Look for alerts related to pod failures. The custom alerts you configured should appear in the Alertmanager interface. + +![img](./img/alert-triggered-on-prometheus.png.webp) +![img](./img/alert-triggered-on-alertmanager.png.webp) + +This process ensures that your custom alerting rules are working correctly and that you are notified when a pod fails. + +### Step 5: Understanding Custom Alert Rules +To better understand how to create and customize alert rules, let’s break down one of the alert rules defined in our custom-alert-rules.yaml. We'll use the KubernetesPodNotHealthy alert as an example: + +```yaml +- alert: KubernetesPodNotHealthy + expr: sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0 + for: 1m + labels: + severity: critical + annotations: + summary: Kubernetes Pod not healthy (instance {{ $labels.instance }}) + description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" +``` +## Alert Fields Breakdown + +**alert:** The name of the alert (KubernetesPodNotHealthy). +**expr:** The Prometheus expression to evaluate. This alert triggers if any pod in a Pending, Unknown, or Failed state is detected. +**for:** The duration for which the condition should be true before the alert fires (1m or 1 minute). +**labels:** Additional labels to categorize the alert. In this case, we label it with a severity of critical. +**annotations:** Descriptive information about the alert. These fields can provide context when the alert is triggered: +**— — summary:** A brief description of the alert (Kubernetes Pod not healthy (instance {{ $labels.instance }})). +**— — description:** A detailed description that includes dynamic values from the alert labels (Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}). + +These fields help to provide clarity and context when an alert is triggered, making it easier to diagnose and respond to issues in your cluster. + + +For more examples of custom Prometheus alert rules, you can refer to this Awesome Prometheus Alerts repository. +### Step 5: Cleanup +If you want to remove Prometheus, Alertmanager, and Grafana from your Kubernetes cluster, you can do so with the following commands: + +1. **Uninstall the Helm Chart:** +```yaml +helm uninstall kube-prometheus-stack +``` +2. **Verify Resources Are Deleted:** +Check that the Prometheus, AlertManager, and Grafana resources have been removed: ```yaml -global: - resolve_timeout: 5m - -route: - receiver: 'email-alerts' - group_by: ['alertname', 'cluster', 'service'] - group_wait: 30s - group_interval: 5m - repeat_interval: 3h - -receivers: - - name: 'email-alerts' - email_configs: - - to: 'alerts@example.com' - from: 'alertmanager@example.com' - smarthost: 'smtp.example.com:587' - auth_username: 'user' - auth_password: 'password' +kubectl get all -l release=kube-prometheus-stack ``` +## Conclusion -## Best Practices -- Use grouping to consolidate similar alerts into a single notification. -- Define silences for planned maintenance windows to avoid unnecessary alerts. -- Integrate Alertmanager with multiple notification channels for redundancy. -- Monitor Alertmanager itself to ensure it is functioning correctly. +In this guide, we have successfully set up Prometheus and Alertmanager in a Kubernetes cluster using Helm and configured custom alert rules to monitor the cluster’s health. We also explored the components of an alert rule to better understand how they work. This setup provides a robust monitoring solution that can be further extended and customized to suit your needs. For more examples of custom Prometheus alert rules, you can refer to this Awesome Prometheus Alerts repository. ---- -Stay tuned for updates as we continue to enhance this guide! \ No newline at end of file diff --git a/docs/monitoring/elk-stack.md b/docs/monitoring/elk-stack.md index f24288cf..575c28e9 100644 --- a/docs/monitoring/elk-stack.md +++ b/docs/monitoring/elk-stack.md @@ -5,94 +5,354 @@ sidebar_id: "elk-stack" sidebar_position: 4 --- -# ELK Stack: Centralized Logging for Kubernetes +# ⎈ A Hands-On Guide to Kubernetes Logging Using ELK Stack & Filebeat ⚙️ -The ELK Stack (Elasticsearch, Logstash, and Kibana) is a popular solution for centralized logging and log analysis. It allows you to collect, process, and visualize logs from Kubernetes clusters, making it easier to monitor and troubleshoot applications. This guide provides an overview of the ELK Stack, its benefits, and how to set it up in a Kubernetes environment. +#### *⇢ A Comprehensive Guide to Setting Up the ELK Stack on Kubernetes with Helm with Practical Example* ---- +![img](./img/elk-and-filebeat.png.webp) -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about ELK Stack setup and usage in Kubernetes.

-
+In this blog post, we’ll guide you through setting up the ELK stack (Elasticsearch, Logstash, and Kibana) on a Kubernetes cluster using Helm. Helm simplifies the deployment and management of applications on Kubernetes, making it an excellent tool for deploying complex stacks like ELK. We’ll also configure Filebeat to collect and forward logs to Logstash. ---- -## Table of Contents -- [Introduction](#introduction) -- [Why Use the ELK Stack?](#why-use-the-elk-stack) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Best Practices](#best-practices) +### Prerequisites +Before we get started, make sure you have: ---- +- A Kubernetes cluster up and running. -## Introduction -The ELK Stack is a powerful tool for managing and analyzing logs in Kubernetes. It consists of: -- **Elasticsearch**: A distributed search and analytics engine for storing and querying logs. -- **Logstash**: A data processing pipeline that ingests, transforms, and forwards logs to Elasticsearch. -- **Kibana**: A visualization tool for exploring and analyzing logs stored in Elasticsearch. +- Helm installed and configured. ---- +- kubectl installed and configured. -## Why Use the ELK Stack? -- **Centralized Logging**: Collect logs from all Kubernetes pods and nodes in one place. -- **Powerful Querying**: Elasticsearch provides advanced search and analytics capabilities. -- **Visualization**: Kibana offers customizable dashboards for log analysis. -- **Scalability**: The ELK Stack can handle large-scale Kubernetes clusters. +![img](./img/animated-elk-and-filebeat.png.gif) ---- +### Step 1: Install Elasticsearch -## Architecture -The ELK Stack works as follows: -1. **Logstash**: Collects logs from Kubernetes pods and nodes, processes them, and forwards them to Elasticsearch. -2. **Elasticsearch**: Stores the logs and makes them searchable. -3. **Kibana**: Visualizes the logs and provides an interface for querying and analyzing them. +Elasticsearch is the core component of the ELK stack, responsible for storing and indexing logs. We’ll use the official Elasticsearch Helm chart for deployment. ---- +1. **Add the Elastic Helm repository:** -## Installation -> **Note:** Detailed installation steps will be added soon. +```yaml +helm repo add elastic https://helm.elastic.co +helm repo update +``` ---- -## Configuration -The ELK Stack requires configuration for each component: -1. **Logstash**: Define input sources, filters, and output destinations. -2. **Elasticsearch**: Configure storage, indexing, and cluster settings. -3. **Kibana**: Set up dashboards and connect to Elasticsearch. +2. **Create a elasticsearch-values.yaml file with the following content:** -Example Logstash configuration: ```yaml -input { - file { - path => "/var/log/*.log" - type => "kubernetes-logs" - } -} +resources: + requests: + cpu: "200m" + memory: "200Mi" + limits: + cpu: "1000m" + memory: "2Gi" + +antiAffinity: "soft" +``` + + +antiAffinity: "soft": Configures soft anti-affinity, allowing pods to be scheduled on the same node if necessary, but preferring to spread them across nodes when possible. + +3. **Install Elasticsearch:** + +```yaml +helm install elasticsearch elastic/elasticsearch -f elasticsearch-values.yaml +``` + +This command installs Elasticsearch with the specified configurations. + +### Step 2: Configure and Install Filebeat + +Filebeat is a lightweight shipper for forwarding and centralizing log data. We’ll configure Filebeat to collect logs from containerized applications and forward them to Logstash. + + +1. **Create a filebeat-values.yaml file with the following content:** + + +```yaml +filebeatConfig: + filebeat.yml: | + filebeat.inputs: + - type: container + paths: + - /var/log/containers/*.log + processors: + - add_kubernetes_metadata: + host: ${NODE_NAME} + matchers: + - logs_path: + logs_path: "/var/log/containers/" + + output.logstash: + hosts: ["logstash-logstash:5044"] +``` + + +**Explanation:** + + +**filebeat.inputs:** Configures Filebeat to collect logs from container directories. The path /var/log/containers/*.log is where Kubernetes stores container logs. +**processors:** Adds Kubernetes metadata to the logs to provide context, such as pod names and namespaces. + +**output.logstash:** Configures Filebeat to send logs to Logstash at port 5044. + +2. **Install Filebeat using Helm:** + + +```yaml +helm install filebeat elastic/filebeat -f filebeat-values.yaml +``` + +This command installs Filebeat with the specified configuration, ensuring that logs are collected from containers and forwarded to Logstash. + +### Step 3: Configure and Install Logstash + + +Logstash processes and transforms logs before indexing them in Elasticsearch. We’ll set up Logstash to receive logs from Filebeat and send them to Elasticsearch. + +1. **Create a logstash-values.yaml file with the following content:** +```yaml +extraEnvs: + - name: "ELASTICSEARCH_USERNAME" + valueFrom: + secretKeyRef: + name: elasticsearch-master-credentials + key: username + - name: "ELASTICSEARCH_PASSWORD" + valueFrom: + secretKeyRef: + name: elasticsearch-master-credentials + key: password + +logstashConfig: + logstash.yml: | + http.host: 0.0.0.0 + xpack.monitoring.enabled: false + +logstashPipeline: + logstash.conf: | + input { + beats { + port => 5044 + } + } + + output { + elasticsearch { + hosts => ["https://elasticsearch-master:9200"] + cacert => "/usr/share/logstash/config/elasticsearch-master-certs/ca.crt" + user => '${ELASTICSEARCH_USERNAME}' + password => '${ELASTICSEARCH_PASSWORD}' + } + } + +secretMounts: + - name: "elasticsearch-master-certs" + secretName: "elasticsearch-master-certs" + path: "/usr/share/logstash/config/elasticsearch-master-certs" + +service: + type: ClusterIP + ports: + - name: beats + port: 5044 + protocol: TCP + targetPort: 5044 + - name: http + port: 8080 + protocol: TCP + targetPort: 8080 + +resources: + requests: + cpu: "200m" + memory: "200Mi" + limits: + cpu: "1000m" + memory: "1536Mi" + +``` + + +**Explanation:** + +**extraEnvs:** Sets environment variables for Elasticsearch authentication using Kubernetes secrets. +**logstashConfig:** Configures Logstash settings, including enabling HTTP and disabling monitoring. +**logstashPipeline:** Configures Logstash to listen on port 5044 for incoming logs from Filebeat and forward them to Elasticsearch. +**secretMounts:** Mounts the Elasticsearch CA certificate for secure communication between Logstash and Elasticsearch. +**service:** Configures Logstash’s service type as ClusterIP, making it accessible only within the cluster. + -filter { - grok { - match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}" } - } -} -output { - elasticsearch { - hosts => ["http://elasticsearch:9200"] - index => "kubernetes-logs-%{+YYYY.MM.dd}" - } +2. **Install Logstash using Helm:** + +```yaml +helm install logstash elastic/logstash -f logstash-values.yaml +``` +This command installs Logstash with the specified configuration, ensuring that it can receive logs from Filebeat and forward them to Elasticsearch. + + +### Step 4: Configure and Install Kibana + +Kibana provides a user interface for visualizing and interacting with Elasticsearch data. + +1. **Create a kibana-values.yaml file with the following content:** + +```yaml +service: + type: NodePort + port: 5601 + +resources: + requests: + cpu: "200m" + memory: "200Mi" + limits: + cpu: "1000m" + memory: "2Gi" +``` + +***Explanation:*** + +**service.type: NodePort:** Exposes Kibana on a specific port on all nodes in the Kubernetes cluster. This makes it accessible from outside the cluster for development and testing purposes. + + +**port: 5601:** The default port for Kibana, which is exposed for accessing the Kibana web interface. + +2. **Install Kibana using Helm:** + +```yaml +helm install kibana elastic/kibana -f kibana-values.yaml +``` +This command installs Kibana with the specified configuration, allowing you to access it through the exposed port. + +### Step 5: Access Kibana and View Logs + +Now that Kibana is installed and running, you can access it to visualize and analyze the logs collected by Filebeat and processed by Logstash. + + 1. **Find the NodePort assigned to Kibana:** + + ```yaml + kubectl get svc kibana-kibana -n elk -o jsonpath="{.spec.ports[0].nodePort}" + ``` + + + This command retrieves the NodePort assigned to Kibana, which you will use to access the Kibana web interface. + + 2. **Access Kibana:** + + Open your web browser and navigate to: +```yaml + http://: + ``` + + Replace with the IP address of your Kubernetes cluster and with the NodePort value obtained in step 1. + + ![img](./img/kibana-login-page.png.webp) + + 3. **Log in to Kibana:** + + You can get the login credentials for Kibana from the elastic secrets using the below commands. + +```yaml +$ kubectl get secret elasticsearch-master-credentials -o jsonpath="{.data.username}" | base64 --decode + +$ kubectl get secret elasticsearch-master-credentials -o jsonpath="{.data.password}" | base64 --decode +``` +Once you access Kibana, you can start exploring your log data. + +![img](./img/kibana-dashboard.png.webp) + + +Access the logs + +![img](./img/kibana-logs.png.webp) + + +### Step 6: Check Elasticsearch Cluster Health + +To ensure that your Elasticsearch cluster is functioning correctly, you need to verify its health. Here’s how you can check the health of your Elasticsearch cluster: + +**Check Cluster Health:** + +Execute the below command to check the health of your Elasticsearch cluster by querying the _cluster/health endpoint: +```yaml +kubectl exec -it -- curl -XGET -u elastic -vk 'https://elasticsearch-master:9200/_cluster/health?pretty' +``` +**Output:** + +```yaml +$ kubectl exec -it elasticsearch-master-0 -- curl -XGET -u elastic -vk 'https://elasticsearch-master:9200/_cluster/health?pretty' + +Defaulted container "elasticsearch" out of: elasticsearch, configure-sysctl (init) +Enter host password for user 'elastic': +Note: Unnecessary use of -X or --request, GET is already inferred. +* Trying 10.245.158.126:9200... +* TCP_NODELAY set +* Connected to elasticsearch-master (10.245.158.126) port 9200 (#0) +* ALPN, offering h2 +* ALPN, offering http/1.1 +* successfully set certificate verify locations: +* CAfile: /etc/ssl/certs/ca-certificates.crt + CApath: /etc/ssl/certs +* TLSv1.3 (OUT), TLS handshake, Client hello (1): +* TLSv1.3 (IN), TLS handshake, Server hello (2): +* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): +* TLSv1.3 (IN), TLS handshake, Certificate (11): +* TLSv1.3 (IN), TLS handshake, CERT verify (15): +* TLSv1.3 (IN), TLS handshake, Finished (20): +* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): +* TLSv1.3 (OUT), TLS handshake, Finished (20): +* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 +* ALPN, server did not agree to a protocol +* Server certificate: +* subject: CN=elasticsearch-master +* start date: Sep 11 00:42:27 2024 GMT +* expire date: Sep 11 00:42:27 2025 GMT +* issuer: CN=elasticsearch-ca +* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. +* Server auth using Basic with user 'elastic' +> GET /_cluster/health?pretty HTTP/1.1 +> Host: elasticsearch-master:9200 +> Authorization: Basic ZWxhc3RpYzp6a3J6Z2lqd3NDUWlLaDJW +> User-Agent: curl/7.68.0 +> Accept: */* +> +* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): +* Mark bundle as not supporting multiuse +< HTTP/1.1 200 OK +< X-elastic-product: Elasticsearch +< content-type: application/json +< content-length: 468 +< +{ + "cluster_name" : "elasticsearch", + "status" : "green", + "timed_out" : false, + "number_of_nodes" : 3, + "number_of_data_nodes" : 3, + "active_primary_shards" : 13, + "active_shards" : 26, + "relocating_shards" : 0, + "initializing_shards" : 0, + "unassigned_shards" : 0, + "delayed_unassigned_shards" : 0, + "number_of_pending_tasks" : 0, + "number_of_in_flight_fetch" : 0, + "task_max_waiting_in_queue_millis" : 0, + "active_shards_percent_as_number" : 100.0 } -``` +* Connection #0 to host elasticsearch-master left intact +``` + + + +Review the output to understand the cluster’s health status. + +**Conclusion** -## Best Practices -- Use Kubernetes labels and annotations to organize logs effectively. -- Monitor the resource usage of Elasticsearch and Logstash to ensure they scale with your cluster. -- Set up retention policies in Elasticsearch to manage log storage. -- Regularly back up Elasticsearch data to prevent data loss. -- Use Kibana's visualization features to create dashboards for monitoring application performance and troubleshooting issues. +You’ve now set up the ELK stack on Kubernetes using Helm with the provided configurations! Your setup includes Elasticsearch for storing and indexing logs, Logstash for processing and forwarding logs, Filebeat for collecting and shipping logs, and Kibana for visualizing and analyzing your data. This powerful stack will help you monitor and analyze logs from your containerized applications. ---- -Stay tuned for more detailed information on setting up and using the ELK Stack in Kubernetes! \ No newline at end of file +Feel free to customize these configurations based on your specific requirements and environment. Happy logging! diff --git a/docs/monitoring/grafana-loki.md b/docs/monitoring/grafana-loki.md index 040e9b1f..9bae26e7 100644 --- a/docs/monitoring/grafana-loki.md +++ b/docs/monitoring/grafana-loki.md @@ -5,78 +5,275 @@ sidebar_id: "grafana-loki" sidebar_position: 1 --- -# Grafana Loki: Log Aggregation for Kubernetes +# ⎈ A Hands-On Guide to Kubernetes Logging Using Grafana Loki ⚙️ -Grafana Loki is a log aggregation system designed for Kubernetes. It is lightweight, cost-effective, and integrates seamlessly with Grafana for log visualization. This document provides an overview of Grafana Loki, its benefits, and how to set it up in a Kubernetes cluster. +#### *⇢ A Comprehensive Guide to Setting Up the Grafana Loki on Kubernetes with Helm: Practical Example* ---- +![img](./img/logging-grafana-loki.png.webp) -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about Grafana Loki setup and usage in Kubernetes.

-
+In a microservices architecture, monitoring and logging are essential to keep track of various components. Kubernetes generates a large number of logs, and managing them effectively is key to running a healthy cluster. **Grafana Loki** is a highly efficient logging solution that integrates seamlessly with **Grafana** for visualizing logs, allowing you to query and explore logs from multiple sources in one place. ---- +In this guide, I’ll walk you through setting up Grafana Loki in a Kubernetes cluster using Helm, a package manager for Kubernetes. We will use the Loki Stack, which comes bundled with Loki, Promtail, and optionally Grafana. -## Table of Contents -- [Introduction](#introduction) -- [Why Use Grafana Loki?](#why-use-grafana-loki) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Querying Logs](#querying-logs) -- [Best Practices](#best-practices) ---- +![img](./img/animated-logging-grafana-loki.png.gif) -## Introduction -Grafana Loki is a log aggregation system optimized for Kubernetes. Unlike traditional log aggregation systems, Loki does not index the content of logs but instead indexes metadata such as labels. This makes it highly efficient and cost-effective for Kubernetes environments. +### Prerequisites ---- +Before starting, make sure you have: -## Why Use Grafana Loki? -- **Kubernetes-Native**: Designed to work seamlessly with Kubernetes labels and metadata. -- **Cost-Effective**: Minimal indexing reduces storage and processing costs. -- **Integration with Grafana**: Provides a unified interface for metrics and logs. -- **Scalable**: Can handle large-scale Kubernetes clusters with ease. ---- +- A Kubernetes cluster up and running +- Helm installed on your system +- kubectl configured to interact with your cluster -## Architecture -Grafana Loki consists of the following components: -1. **Promtail**: A lightweight agent that collects logs from Kubernetes pods and forwards them to Loki. -2. **Loki**: The central log aggregation system that stores and indexes logs. -3. **Grafana**: A visualization tool used to query and display logs from Loki. +## Steps to Set Up Grafana Loki on Kubernetes ---- +Once you have the prerequisites in place, follow the steps below to set up Grafana Loki using Helm. -## Installation -> **Note:** Detailed installation steps will be added soon. +### Step 1: Add the Grafana Helm Repository ---- +The first step is to add the Grafana Helm repository, which contains the Helm chart for deploying Loki. -## Configuration -> **Note:** Configuration details for Promtail, Loki, and Grafana will be added soon. +Run the following command to add the Grafana repo to Helm: + +```yaml + helm repo add grafana https://grafana.github.io/helm-charts +``` + +After adding the repository, it’s a good practice to update the Helm repo to ensure you have the latest chart versions. Use the command: + +```yaml +helm repo update +``` + +Now, list every repository with the word “Loki” in it by running: + +```yaml +helm search repo loki +``` + +You should see several results, but we will be using the grafana/loki-stack repository to deploy Promtail and Grafana, and to configure Loki. + +### Step 2: Customize Helm Chart Configuration Values +Before deploying Loki, you may want to customize some of the default values in the Helm chart. This step is especially important if you want to install Grafana alongside Loki or configure other advanced features like persistent storage. + + +First, download the default values of the Loki Helm chart into a YAML file by running: + +```yaml +helm show values grafana/loki-stack > loki-custom-values.yaml +``` +Now, open the loki-values.yaml file and make the following changes to meet your specific configuration needs. +Here is the custom loki-custom-values.yaml file: + +```yaml +test_pod: + enabled: true + image: bats/bats:1.8.2 + pullPolicy: IfNotPresent + +loki: + enabled: true + isDefault: true + url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }} + readinessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + livenessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + datasource: + jsonData: "{}" + uid: "" + + +promtail: + enabled: true + config: + logLevel: info + serverPort: 3101 + clients: + - url: http://{{ .Release.Name }}:3100/loki/api/v1/push + +fluent-bit: + enabled: false + +grafana: + enabled: true + sidecar: + datasources: + label: "" + labelValue: "" + enabled: true + maxLines: 1000 + image: + tag: 10.3.3 + service: + type: NodePort + +prometheus: + enabled: false + isDefault: false + url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }} + datasource: + jsonData: "{}" + +filebeat: + enabled: false + filebeatConfig: + filebeat.yml: | + # logging.level: debug + filebeat.inputs: + - type: container + paths: + - /var/log/containers/*.log + processors: + - add_kubernetes_metadata: + host: ${NODE_NAME} + matchers: + - logs_path: + logs_path: "/var/log/containers/" + output.logstash: + hosts: ["logstash-loki:5044"] + +logstash: + enabled: false + image: grafana/logstash-output-loki + imageTag: 1.0.1 + filters: + main: |- + filter { + if [kubernetes] { + mutate { + add_field => { + "container_name" => "%{[kubernetes][container][name]}" + "namespace" => "%{[kubernetes][namespace]}" + "pod" => "%{[kubernetes][pod][name]}" + } + replace => { "host" => "%{[kubernetes][node][name]}"} + } + } + mutate { + remove_field => ["tags"] + } + } + outputs: + main: |- + output { + loki { + url => "http://loki:3100/loki/api/v1/push" + #username => "test" + #password => "test" + } + # stdout { codec => rubydebug } + } + +# proxy is currently only used by loki test pod +# Note: If http_proxy/https_proxy are set, then no_proxy should include the +# loki service name, so that tests are able to communicate with the loki +# service. +proxy: + http_proxy: "" + https_proxy: "" + no_proxy: "" +``` + +***Key Points in Custom Configuration:*** + + +- **Loki** is enabled and configured with readiness and liveness probes for health checking. +- **Promtail** is enabled to forward logs from Kubernetes nodes to Loki. +- **Grafana** is enabled with a **NodePort** service to allow access to the Grafana UI from outside the cluster. +- **Prometheus**, **Filebeat**, and **Logstash** are explicitly disabled. + +### Step 3: Deploy the Loki Stack with Custom Values + +After editing the loki-cusomt-values.yaml file, you are ready to deploy the Loki stack. Use the following command to install or upgrade the Helm release: +```yaml + +helm upgrade --install --values loki-custom-values.yaml loki grafana/loki-stack -n grafana-loki --create-namespace +``` +**This command:** +- Deploys the **Loki**, **Promtail**, and **Grafana** components. +- Disables the **Prometheus**, **Filebeat**, and **Logstash** components as per the configuration. +- Creates a namespace grafana-loki and deploys all components inside this namespace. + +### Step 4: Access Grafana and Configure Data Source +Once the Helm chart has been successfully deployed, it’s time to access Grafana and verify that everything is working correctly. + +1. ***First, check the pods in the grafana-loki namespace to ensure everything is running:*** +```yaml +$ kubectl get pods -n grafana-loki +NAME READY STATUS RESTARTS AGE +loki-0 1/1 Running 0 19m +loki-grafana-567d65596c-gvt5q 2/2 Running 0 17m +loki-promtail-8jng6 1/1 Running 0 19m +loki-promtail-hb6x2 1/1 Running 0 19m +``` + +2. ***Find the NodePort assigned to Grafana:*** + +```yaml +$ kubectl get svc loki-grafana -n grafana-loki -o jsonpath="{.spec.ports[0].nodePort}" +30287 +``` +This command retrieves the NodePort assigned to Grafana, which you will use to access the Grafana web interface. + +3. ***Access Kibana:*** + +Open your web browser and navigate to: + +```yaml +http://: +``` + +Replace with the IP address of your Kubernetes cluster and with the NodePort value obtained in step 1 + +![img](./img/grafana-ui.png.webp) + +4. ***Log in to Grafana:*** + +You can get the login credentials for Grafana from the loki-grafana secret using the below commands. + +```yaml +$ kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-user}" | base64 --decode +admin +$ kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-password}" | base64 --decode +C43ICy6t22dwI3W93DsDPiiSUeX5Z4aHMwKWkNvq% +``` + +Once you log in you can see the home screen of Grafana, press the three lines at the top left corner you can see the menu then go to **Connections > Data sources** as shown in the below image. + +![img](./img/grafana-home.png.webp) + +In Data sources you can see Loki has been configured as shown below + +![img](./img/grafana-data-sources.png.webp) + + +Now, check if you are getting logs or not. Go to Explore by pressing the Explore button. + + + +To query logs select a Label and Value, Loki will collect every log in your Kubernetes cluster and label it according to container, pod, namespace, deployments, jobs and other objects of Kubernetes. + +![img](./img/grafana-query.png.webp) + +After selecting a Label(namespace) and Value(grafana-loki), press the blue button at the top right corner(Run Query)to query logs. + +![img](./img/grafana-logs.png.webp) + + +Promtail, running as a DaemonSet, will collect logs from all nodes and forward them to Loki. You can query these logs in Grafana, making it easy to monitor your Kubernetes applications. ---- -## Querying Logs -Grafana Loki uses a query language called **LogQL** to filter and analyze logs. Example queries: -- Retrieve logs for a specific pod: - ```logql - {pod="my-app-pod"} - ``` -- Filter logs by a specific label: - ```logql - {app="my-app", level="error"} - ``` - -## Best Practices -- Use Kubernetes labels effectively to organize and query logs. -- Monitor Loki's resource usage to ensure it scales with your cluster. -- Set up retention policies to manage log storage efficiently. -- Integrate Loki with Grafana dashboards for unified monitoring. ---- -Stay tuned for updates as we continue to enhance this guide! \ No newline at end of file +## Conclusion +In this post, we walked through how to deploy Grafana Loki on Kubernetes using Helm with customized values. By enabling Loki, Promtail, and Grafana, and disabling unnecessary components like Prometheus, Filebeat, and Logstash, we tailored the setup to meet specific logging needs. +Grafana Loki offers an efficient, scalable solution for Kubernetes log management. With this setup, you can now monitor and explore your Kubernetes logs with ease. \ No newline at end of file diff --git a/docs/monitoring/img/alert-manager-architecture.png.webp b/docs/monitoring/img/alert-manager-architecture.png.webp new file mode 100644 index 00000000..964d628a Binary files /dev/null and b/docs/monitoring/img/alert-manager-architecture.png.webp differ diff --git a/docs/monitoring/img/alert-triggered-on-alertmanager.png.webp b/docs/monitoring/img/alert-triggered-on-alertmanager.png.webp new file mode 100644 index 00000000..022ddb0d Binary files /dev/null and b/docs/monitoring/img/alert-triggered-on-alertmanager.png.webp differ diff --git a/docs/monitoring/img/alert-triggered-on-prometheus.png.webp b/docs/monitoring/img/alert-triggered-on-prometheus.png.webp new file mode 100644 index 00000000..f801f259 Binary files /dev/null and b/docs/monitoring/img/alert-triggered-on-prometheus.png.webp differ diff --git a/docs/monitoring/img/alertmanager-ui.png.webp b/docs/monitoring/img/alertmanager-ui.png.webp new file mode 100644 index 00000000..3424a15e Binary files /dev/null and b/docs/monitoring/img/alertmanager-ui.png.webp differ diff --git a/docs/monitoring/img/alertmanager.png.webp b/docs/monitoring/img/alertmanager.png.webp new file mode 100644 index 00000000..ea8b6f7e Binary files /dev/null and b/docs/monitoring/img/alertmanager.png.webp differ diff --git a/docs/monitoring/img/alerts-fired.png.webp b/docs/monitoring/img/alerts-fired.png.webp new file mode 100644 index 00000000..cf19d9f1 Binary files /dev/null and b/docs/monitoring/img/alerts-fired.png.webp differ diff --git a/docs/monitoring/img/alerts-in-prometheus-ui.png.webp b/docs/monitoring/img/alerts-in-prometheus-ui.png.webp new file mode 100644 index 00000000..b0aa152b Binary files /dev/null and b/docs/monitoring/img/alerts-in-prometheus-ui.png.webp differ diff --git a/docs/monitoring/img/animated-elk-and-filebeat.png.gif b/docs/monitoring/img/animated-elk-and-filebeat.png.gif new file mode 100644 index 00000000..7805b820 Binary files /dev/null and b/docs/monitoring/img/animated-elk-and-filebeat.png.gif differ diff --git a/docs/monitoring/img/animated-logging-grafana-loki.png.gif b/docs/monitoring/img/animated-logging-grafana-loki.png.gif new file mode 100644 index 00000000..2eb86f8b Binary files /dev/null and b/docs/monitoring/img/animated-logging-grafana-loki.png.gif differ diff --git a/docs/monitoring/img/animated-promethes-and-grafana.png.gif b/docs/monitoring/img/animated-promethes-and-grafana.png.gif new file mode 100644 index 00000000..6f9b6bc1 Binary files /dev/null and b/docs/monitoring/img/animated-promethes-and-grafana.png.gif differ diff --git a/docs/monitoring/img/custom-alerts.png.gif b/docs/monitoring/img/custom-alerts.png.gif new file mode 100644 index 00000000..514687a2 Binary files /dev/null and b/docs/monitoring/img/custom-alerts.png.gif differ diff --git a/docs/monitoring/img/dashboard-current-alerts.png.webp b/docs/monitoring/img/dashboard-current-alerts.png.webp new file mode 100644 index 00000000..67ae851e Binary files /dev/null and b/docs/monitoring/img/dashboard-current-alerts.png.webp differ diff --git a/docs/monitoring/img/dashboard-id.png.webp b/docs/monitoring/img/dashboard-id.png.webp new file mode 100644 index 00000000..5a66e7c9 Binary files /dev/null and b/docs/monitoring/img/dashboard-id.png.webp differ diff --git a/docs/monitoring/img/dashboard-in-grafana.png.webp b/docs/monitoring/img/dashboard-in-grafana.png.webp new file mode 100644 index 00000000..360e3269 Binary files /dev/null and b/docs/monitoring/img/dashboard-in-grafana.png.webp differ diff --git a/docs/monitoring/img/dashboard-k8s.png.webp b/docs/monitoring/img/dashboard-k8s.png.webp new file mode 100644 index 00000000..b397855a Binary files /dev/null and b/docs/monitoring/img/dashboard-k8s.png.webp differ diff --git a/docs/monitoring/img/dashboard.png.webp b/docs/monitoring/img/dashboard.png.webp new file mode 100644 index 00000000..8743346d Binary files /dev/null and b/docs/monitoring/img/dashboard.png.webp differ diff --git a/docs/monitoring/img/elk-and-filebeat.png.webp b/docs/monitoring/img/elk-and-filebeat.png.webp new file mode 100644 index 00000000..ebd7da82 Binary files /dev/null and b/docs/monitoring/img/elk-and-filebeat.png.webp differ diff --git a/docs/monitoring/img/grafana-dashboard-metrics.png.webp b/docs/monitoring/img/grafana-dashboard-metrics.png.webp new file mode 100644 index 00000000..4709e444 Binary files /dev/null and b/docs/monitoring/img/grafana-dashboard-metrics.png.webp differ diff --git a/docs/monitoring/img/grafana-dashboard.png.webp b/docs/monitoring/img/grafana-dashboard.png.webp new file mode 100644 index 00000000..415b81d1 Binary files /dev/null and b/docs/monitoring/img/grafana-dashboard.png.webp differ diff --git a/docs/monitoring/img/grafana-dashboards.png.webp b/docs/monitoring/img/grafana-dashboards.png.webp new file mode 100644 index 00000000..86fd3d85 Binary files /dev/null and b/docs/monitoring/img/grafana-dashboards.png.webp differ diff --git a/docs/monitoring/img/grafana-data-sources.png.webp b/docs/monitoring/img/grafana-data-sources.png.webp new file mode 100644 index 00000000..b3388073 Binary files /dev/null and b/docs/monitoring/img/grafana-data-sources.png.webp differ diff --git a/docs/monitoring/img/grafana-home.png.webp b/docs/monitoring/img/grafana-home.png.webp new file mode 100644 index 00000000..3891e7c4 Binary files /dev/null and b/docs/monitoring/img/grafana-home.png.webp differ diff --git a/docs/monitoring/img/grafana-kube-state-metrics-v2.png.webp b/docs/monitoring/img/grafana-kube-state-metrics-v2.png.webp new file mode 100644 index 00000000..a7582ca8 Binary files /dev/null and b/docs/monitoring/img/grafana-kube-state-metrics-v2.png.webp differ diff --git a/docs/monitoring/img/grafana-library.png.webp b/docs/monitoring/img/grafana-library.png.webp new file mode 100644 index 00000000..12516962 Binary files /dev/null and b/docs/monitoring/img/grafana-library.png.webp differ diff --git a/docs/monitoring/img/grafana-logs.png.webp b/docs/monitoring/img/grafana-logs.png.webp new file mode 100644 index 00000000..63a1782c Binary files /dev/null and b/docs/monitoring/img/grafana-logs.png.webp differ diff --git a/docs/monitoring/img/grafana-query.png.webp b/docs/monitoring/img/grafana-query.png.webp new file mode 100644 index 00000000..0e5c8782 Binary files /dev/null and b/docs/monitoring/img/grafana-query.png.webp differ diff --git a/docs/monitoring/img/grafana-ui.png.webp b/docs/monitoring/img/grafana-ui.png.webp new file mode 100644 index 00000000..ff890b26 Binary files /dev/null and b/docs/monitoring/img/grafana-ui.png.webp differ diff --git a/docs/monitoring/img/import-dashboard-11455.png.webp b/docs/monitoring/img/import-dashboard-11455.png.webp new file mode 100644 index 00000000..bf93a200 Binary files /dev/null and b/docs/monitoring/img/import-dashboard-11455.png.webp differ diff --git a/docs/monitoring/img/import-dashboard-13345.webp b/docs/monitoring/img/import-dashboard-13345.webp new file mode 100644 index 00000000..77d29467 Binary files /dev/null and b/docs/monitoring/img/import-dashboard-13345.webp differ diff --git a/docs/monitoring/img/import-dashboard.png.webp b/docs/monitoring/img/import-dashboard.png.webp new file mode 100644 index 00000000..2be73425 Binary files /dev/null and b/docs/monitoring/img/import-dashboard.png.webp differ diff --git a/docs/monitoring/img/import-grafana-dashboard.png.webp b/docs/monitoring/img/import-grafana-dashboard.png.webp new file mode 100644 index 00000000..eede0b97 Binary files /dev/null and b/docs/monitoring/img/import-grafana-dashboard.png.webp differ diff --git a/docs/monitoring/img/kibana-dashboard.png.webp b/docs/monitoring/img/kibana-dashboard.png.webp new file mode 100644 index 00000000..63db65eb Binary files /dev/null and b/docs/monitoring/img/kibana-dashboard.png.webp differ diff --git a/docs/monitoring/img/kibana-login-page.png.webp b/docs/monitoring/img/kibana-login-page.png.webp new file mode 100644 index 00000000..cf8df6d9 Binary files /dev/null and b/docs/monitoring/img/kibana-login-page.png.webp differ diff --git a/docs/monitoring/img/kibana-logs.png.webp b/docs/monitoring/img/kibana-logs.png.webp new file mode 100644 index 00000000..07963b61 Binary files /dev/null and b/docs/monitoring/img/kibana-logs.png.webp differ diff --git a/docs/monitoring/img/kube-state-metrics-v2.png.webp b/docs/monitoring/img/kube-state-metrics-v2.png.webp new file mode 100644 index 00000000..adbc80d0 Binary files /dev/null and b/docs/monitoring/img/kube-state-metrics-v2.png.webp differ diff --git a/docs/monitoring/img/load-grafana-dashboard.png.webp b/docs/monitoring/img/load-grafana-dashboard.png.webp new file mode 100644 index 00000000..a5e7a009 Binary files /dev/null and b/docs/monitoring/img/load-grafana-dashboard.png.webp differ diff --git a/docs/monitoring/img/logging-and-metrics.png.webp b/docs/monitoring/img/logging-and-metrics.png.webp new file mode 100644 index 00000000..e4dd2dee Binary files /dev/null and b/docs/monitoring/img/logging-and-metrics.png.webp differ diff --git a/docs/monitoring/img/logging-grafana-loki.png.webp b/docs/monitoring/img/logging-grafana-loki.png.webp new file mode 100644 index 00000000..75c38920 Binary files /dev/null and b/docs/monitoring/img/logging-grafana-loki.png.webp differ diff --git a/docs/monitoring/img/pod-dashboard-example.png.webp b/docs/monitoring/img/pod-dashboard-example.png.webp new file mode 100644 index 00000000..13639441 Binary files /dev/null and b/docs/monitoring/img/pod-dashboard-example.png.webp differ diff --git a/docs/monitoring/img/prometheus-alerts.png.webp b/docs/monitoring/img/prometheus-alerts.png.webp new file mode 100644 index 00000000..2dae20f7 Binary files /dev/null and b/docs/monitoring/img/prometheus-alerts.png.webp differ diff --git a/docs/monitoring/img/prometheus-and-grafana-flowchart.png.webp b/docs/monitoring/img/prometheus-and-grafana-flowchart.png.webp new file mode 100644 index 00000000..a003dc72 Binary files /dev/null and b/docs/monitoring/img/prometheus-and-grafana-flowchart.png.webp differ diff --git a/docs/monitoring/img/prometheus-ui.png.webp b/docs/monitoring/img/prometheus-ui.png.webp new file mode 100644 index 00000000..f4f2df6e Binary files /dev/null and b/docs/monitoring/img/prometheus-ui.png.webp differ diff --git a/docs/monitoring/img/promethues-rule.png.webp b/docs/monitoring/img/promethues-rule.png.webp new file mode 100644 index 00000000..29115ed1 Binary files /dev/null and b/docs/monitoring/img/promethues-rule.png.webp differ diff --git a/docs/monitoring/metrics-loggings-grafana-loki.md b/docs/monitoring/metrics-loggings-grafana-loki.md new file mode 100644 index 00000000..e4534670 --- /dev/null +++ b/docs/monitoring/metrics-loggings-grafana-loki.md @@ -0,0 +1,332 @@ +--- +// filepath: /Users/anveshmuppeda/Desktop/anvesh/tech/git/kubernetes/docs/monitoring/metrics-loggings grafana-loki.md +sidebar_label: "Metrics Loggings" +sidebar_id: "metrics-loggings" +sidebar_position: 5 +--- +# ⎈ A Hands-On Guide to Kubernetes Monitoring: Metrics and Logging with Grafana Loki ⚙️ + +#### *⇢ A Step-by-Step Guide to Setting Up Metrics and Logging in Kubernetes Using the Grafana, Loki, Prometheus, Logstash, and Filebeat for Full Cluster Observability* + +![img](./img/logging-and-metrics.png.webp) + +In a microservices architecture, monitoring both **metrics** and **logs** is critical for ensuring the health and performance of your applications. When running Kubernetes clusters, the ability to efficiently collect and visualize this data can be complex. With tools like **Grafana**, **Loki**, **Prometheus**, **Logstash**, and **Filebeat**, we can set up a powerful monitoring stack that provides complete observability. + + + +This blog will guide you through setting up a comprehensive monitoring solution in Kubernetes, focusing on both metrics and logging. We will use the following tools: + +- **Grafana:** For visualizing metrics and logs. +- **Loki:** For aggregating and storing logs. +- **Prometheus:** For collecting metrics. +- **Logstash:** For log processing and forwarding. +- **Filebeat:** For collecting log files from Kubernetes pods. + + +![img](./img/logging-and-metrics.png.webp) + + +We’ll use Helm to deploy these tools as it simplifies managing Kubernetes applications through charts. This tutorial builds upon the previous setup of Grafana Loki for logging and expands it to include Prometheus for metrics and more robust log collection with Logstash and Filebeat. + +## Prerequisites + +Before starting, ensure you have the following: + + +- A Kubernetes cluster up and running. +- Helm installed on your machine. +- kubectl configured to interact with your cluster. + +### Step 1: Add the Grafana Helm Repository + +To begin, add the Grafana Helm repository, which contains the charts for deploying Loki and other monitoring tools: + +```yaml +helm repo add grafana https://grafana.github.io/helm-charts +helm repo update +``` +Next, search for the Loki chart: + +```yaml +helm search repo loki +``` + +We will be using the grafana/loki-stack chart for this deployment, which includes Grafana, Loki, and additional components. + +### Step 2: Customize Helm Chart Configuration + +We’ll customize the default Helm chart values to enable Prometheus for metrics, configure Filebeat for log collection, and set up Logstash for advanced log processing. Below is the updated loki-custom-values.yaml file: + +```yaml + +test_pod: + enabled: true + image: bats/bats:1.8.2 + pullPolicy: IfNotPresent + +loki: + enabled: true + isDefault: true + fullnameOverride: loki + url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }} + readinessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + livenessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + datasource: + jsonData: "{}" + uid: "" + + +promtail: + enabled: false + config: + logLevel: info + serverPort: 3101 + clients: + - url: http://{{ .Release.Name }}:3100/loki/api/v1/push + +fluent-bit: + enabled: false + +grafana: + enabled: true + sidecar: + datasources: + label: "" + labelValue: "" + enabled: true + maxLines: 1000 + image: + tag: 10.3.3 + service: + type: NodePort + +prometheus: + enabled: true + isDefault: false + url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }} + datasource: + jsonData: "{}" + server: + service: + type: NodePort + persistentVolume: + ## If true, Prometheus server will create/use a Persistent Volume Claim + ## If false, use emptyDir + ## + enabled: false + +filebeat: + enabled: true + filebeatConfig: + filebeat.yml: | + # logging.level: debug + filebeat.inputs: + - type: container + paths: + - /var/log/containers/*.log + processors: + - add_kubernetes_metadata: + host: ${NODE_NAME} + matchers: + - logs_path: + logs_path: "/var/log/containers/" + output.logstash: + hosts: ["logstash-loki-headless:5044"] + +logstash: + enabled: true + image: grafana/logstash-output-loki + imageTag: 1.0.1 + + fullnameOverride: logstash-loki + + logstashConfig: + logstash.yml: | + http.host: 0.0.0.0 + xpack.monitoring.enabled: false + + logstashPipeline: + logstash.conf: | + input { + beats { + port => 5044 + } + } + + filter { + if [kubernetes] { + mutate { + add_field => { + "container" => "%{[kubernetes][container][name]}" + "namespace" => "%{[kubernetes][namespace]}" + "pod" => "%{[kubernetes][pod][name]}" + } + replace => { "host" => "%{[kubernetes][node][name]}"} + } + } + mutate { + remove_field => ["tags"] + } + } + + output { + loki { + url => "http://loki:3100/loki/api/v1/push" + } + # stdout { codec => rubydebug } + } + +proxy: + http_proxy: "" + https_proxy: "" + no_proxy: "" +``` + +Key points: + +- **Prometheus** is enabled for metrics collection with NodePort service. +- **Filebeat** is enabled for log collection from Kubernetes pods. +- **Logstash** is enabled and configured to receive logs from Filebeat and forward them to Loki. + +### Step 3: Deploy the Monitoring Stack + +Once the loki-custom-values.yaml file is ready, deploy the stack using Helm: + +```yaml +helm upgrade --install --values loki-custom-values.yaml loki grafana/loki-stack -n grafana-loki --create-namespace +``` +This command: + +- Deploys Loki, Prometheus, Filebeat, Logstash, and Grafana. +- Disable Promtail. +- Configures Prometheus to collect metrics and Filebeat to collect logs. +- Sets up Logstash to forward logs to Loki for central logging. + +### Step 4: Access the Cluster Logs on Grafana +After the deployment, you need to access Grafana and configure data sources for metrics and logs. + +1. **Check the Pods:** Verify that all the components are running correctly in the grafana-loki namespace: + +```yaml +$ kubectl get pods -n grafana-loki + +NAME READY STATUS RESTARTS AGE +logstash-loki-0 1/1 Running 0 59m +loki-0 1/1 Running 0 6h5m +loki-alertmanager-0 1/1 Running 0 22m +loki-filebeat-6gl8t 1/1 Running 0 53m +loki-filebeat-jrn5n 1/1 Running 0 53m +loki-filebeat-p8pl8 1/1 Running 0 53m +loki-grafana-568895c66-c7pxl 2/2 Running 0 59m +loki-kube-state-metrics-77ffbdd8db-x64lh 1/1 Running 0 50m +loki-prometheus-node-exporter-2hfgb 1/1 Running 0 50m +loki-prometheus-node-exporter-9qq9c 1/1 Running 0 50m +loki-prometheus-node-exporter-tkctf 1/1 Running 0 50m +loki-prometheus-pushgateway-69d48d6874-hgd7v 1/1 Running 0 50m +loki-prometheus-server-8475684f7c-qh44p 2/2 Running 0 48m +``` + +2. **Find the NodePort** for Grafana: Retrieve the NodePort assigned to the Grafana service: + + +```yaml +$ kubectl get svc loki-grafana -n grafana-loki -o jsonpath="{.spec.ports[0].nodePort}" + +32181 +``` + +3. **Access the Grafana UI:** Open your browser and navigate to: + +```yaml +http://: +``` + +Replace with your cluster's IP address and with the NodePort you retrieved. + +![img](./img/grafana-ui.png.webp) + +4. **Log in to Grafana:** Retrieve the default login credentials: + +```yaml +kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-user}" | base64 --decode +kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-password}" | base64 --decode +``` +Once you log in you can see the home screen of Grafana, press the three lines at the top left corner you can see the menu then go to **Connections > Data sources** as shown in the below image. + +![img](./img/grafana-home.png.webp) + +In Data sources you can see Loki has been configured as shown below + +![img](./img/grafana-data-sources.png.webp) + + +Now, check if you are getting logs or not. Go to Explore by pressing the Explore button. + + +To query logs select a Label and Value, Loki will collect every log in your Kubernetes cluster and label it according to container, pod, namespace, deployments, jobs and other objects of Kubernetes. + +![img](./img/grafana-query.png.webp) + +After selecting a Label(namespace) and Value(grafana-loki), press the blue button at the top right corner(Run Query)to query logs. + +![img](./img/grafana-logs.png.webp) + +Filebeat, running as a DaemonSet, will collect logs from all nodes and send them to Logstash, and logstash forward them to Loki. You can query these logs in Grafana, making it easy to monitor your Kubernetes applications. + + +### Step 5: Access Metrics on Grafana by Adding new Dashboards + +1. Login to the Grafana by follwoing the above same steps. +2. Navigate to Home > Dashboards section + +![img](./img/grafana-dashboard-metrics.png.webp) + +3. Add/Create new Dashboards + +We also have the flexibility to create our own dashboards from scratch or import multiple Grafana dashboards from the Grafana library. + +To import a Grafana dashboard, follow these steps: + +Step 1: Access the Grafana library. + +Step 2. Select the desired dashboard ID to add. + +Considering kube-state-metrics-v2 Dashboard + +![img](./img/kube-state-metrics-v2.png.webp) + +Copy the Id of kube-state-metrics-v2 Dashboard i.e., 13332 + + +### Step 3: Import selected Dashboard in Grafana + +Access Home > Dashboard section & click on Import section. + +![img](./img/import-grafana-dashboard.png.webp) + +![img](./img/import-dashboard-13345.webp) + +Now enter the ID of the target new Dashboard i.e., 13332 then click on Load to load the new dashboard into Grafana. + + +![img](./img/load-grafana-dashboard.png.webp) + +Click on Import to import the new Dashboard & Access it. + +![img](./img/grafana-kube-state-metrics-v2.png.webp) + + +These steps allow us to easily integrate any dashboard from the Grafana library. Now that everything is set up, you can start visualizing both metrics and logs in Grafana. + + +## Conclusion + +In this blog, we have built a complete monitoring stack for Kubernetes that includes both metrics and logs. By using Grafana for visualization, Loki for log aggregation, Prometheus for metrics collection, Filebeat for log collection, and Logstash for log processing, you can ensure that your Kubernetes cluster is fully observable. This setup provides a powerful way to monitor and troubleshoot your applications, ensuring better reliability and performance. diff --git a/docs/monitoring/promrtheus-grafana.md b/docs/monitoring/promrtheus-grafana.md index 1cbffc43..20339b12 100644 --- a/docs/monitoring/promrtheus-grafana.md +++ b/docs/monitoring/promrtheus-grafana.md @@ -5,82 +5,376 @@ sidebar_id: "prometheus-grafana" sidebar_position: 2 --- -# Prometheus and Grafana: Monitoring Kubernetes Clusters +# ⎈ A Hands-On Guide to Kubernetes Monitoring Using Prometheus & Grafana🛠️ -Prometheus and Grafana are widely used tools for monitoring and visualizing metrics in Kubernetes clusters. Prometheus collects and stores metrics, while Grafana provides a powerful interface for querying and visualizing these metrics. This guide provides an overview of Prometheus and Grafana, their benefits, and how to set them up in a Kubernetes cluster. +#### *⇢ Understanding Prometheus & Grafana Setup in Kubernetes: A Comprehensive Guide* ---- +![img](./img/prometheus-and-grafana-flowchart.png.webp) -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about Prometheus and Grafana setup and usage in Kubernetes.

-
---- +## Introduction +In the dynamic world of containerized applications and microservices, monitoring is indispensable for maintaining the health, performance, and reliability of your infrastructure. Kubernetes, with its ability to orchestrate containers at scale, introduces new challenges and complexities in monitoring. This is where tools like Prometheus and Grafana come into play. +**Prometheus** is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It excels at monitoring metrics and providing powerful query capabilities against time-series data. Meanwhile, **Grafana** complements Prometheus by offering visualization capabilities through customizable dashboards and graphs. +- In this blog post, we will guide you through the process of setting up Prometheus and Grafana on a Kubernetes cluster using Helm. By the end of this tutorial, you will have a robust monitoring solution that allows you to: +- Collect and store metrics from your Kubernetes cluster and applications. +Visualize these metrics through intuitive dashboards. +- Set up alerts based on predefined thresholds or anomalies. +- Gain insights into the performance and resource utilization of your cluster. -## Table of Contents -- [Introduction](#introduction) -- [Why Use Prometheus and Grafana?](#why-use-prometheus-and-grafana) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Creating Dashboards](#creating-dashboards) -- [Best Practices](#best-practices) ---- -## Introduction -Prometheus and Grafana are essential tools for monitoring Kubernetes clusters. Prometheus collects metrics from Kubernetes components, applications, and infrastructure, while Grafana visualizes these metrics in customizable dashboards. +Whether you are deploying your first Kubernetes cluster or looking to enhance your existing monitoring setup, understanding how to leverage Prometheus and Grafana effectively is essential. Let’s dive into the step-by-step process of deploying and configuring these powerful tools on Kubernetes. ---- +### Prerequisites +Before we get started, ensure you have the following: -## Why Use Prometheus and Grafana? -- **Comprehensive Monitoring**: Collects metrics from Kubernetes nodes, pods, and applications. -- **Custom Dashboards**: Grafana allows you to create tailored dashboards for specific use cases. -- **Alerting**: Prometheus supports alerting rules to notify you of critical issues. -- **Scalability**: Both tools can handle large-scale Kubernetes clusters. +- A running Kubernetes cluster. ---- +- kubectl command-line tool configured to communicate with your cluster. +- Helm (the package manager for Kubernetes) installed. -## Architecture -Prometheus and Grafana work together as follows: -1. **Prometheus**: Scrapes metrics from Kubernetes components and stores them in a time-series database. -2. **Grafana**: Queries Prometheus for metrics and visualizes them in dashboards. -3. **Alertmanager**: (Optional) Used with Prometheus to send alerts based on defined rules. +### Setting up Prometheus and Grafana +**Step 1: Adding the Helm Repository** ---- +First, add the Prometheus community Helm repository and update it: -## Installation -> **Note:** Detailed installation steps will be added soon. +```yaml +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm repo update +``` ---- -## Configuration -> **Note:** Configuration details for Prometheus, Grafana, and Alertmanager will be added soon. +**Step 2: Installing Prometheus and Grafana** + +Create a custom-values.yaml file to customize the Helm chart installation. This file will configure Prometheus and Grafana to be exposed via NodePorts. +```yaml +# custom-values.yaml +prometheus: + service: + type: NodePort +grafana: + service: + type: NodePort +``` + + +Then, install the kube-prometheus-stack using Helm: +```yaml +helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f custom-values.yaml +``` +Output: +```yaml +$ helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f custom-values.yaml +Release "kube-prometheus-stack" does not exist. Installing it now. +NAME: kube-prometheus-stack +LAST DEPLOYED: Sun Jun 16 17:04:53 2024 +NAMESPACE: default +STATUS: deployed +REVISION: 1 +NOTES: +kube-prometheus-stack has been installed. Check its status by running: + kubectl --namespace default get pods -l "release=kube-prometheus-stack" + +Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. +``` +**Step 3: Verifying the Installation** + +After the installation, you can verify that the Prometheus and Grafana services are created and exposed on NodePorts: +```yaml +kubectl get services +``` +You should see output similar to this, showing the services with their respective NodePorts: +```yaml +$ kubectl get services +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 5m19s +kube-prometheus-stack-alertmanager ClusterIP 10.245.239.151 9093/TCP,8080/TCP 5m22s +kube-prometheus-stack-grafana NodePort 10.245.30.17 80:31519/TCP 5m22s +kube-prometheus-stack-kube-state-metrics ClusterIP 10.245.26.205 8080/TCP 5m22s +kube-prometheus-stack-operator ClusterIP 10.245.19.171 443/TCP 5m22s +kube-prometheus-stack-prometheus NodePort 10.245.151.164 9090:30090/TCP,8080:32295/TCP 5m22s +kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.245.22.30 9100/TCP 5m22s +kubernetes ClusterIP 10.245.0.1 443/TCP 57d +prometheus-operated ClusterIP None 9090/TCP 5m19s +``` +**Step 4: Accessing Prometheus and Grafana** + +To access Prometheus and Grafana dashboards outside the cluster, you need the external IP of any node in the cluster and the NodePorts on which the services are exposed. + + +Get the external IP addresses of your nodes: +```yaml +kubectl get nodes -o wide +``` + +You should see output similar to this: + +```yaml +$ kubectl get nodes -o wide +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +pool-t5ss0fagn-jeb47 Ready 57d v1.29.1 10.124.0.2 146.190.55.222 Debian GNU/Linux 12 (bookworm) 6.1.0-17-amd64 containerd://1.6.28 +``` + +Use the external IP of any node and the NodePorts to access the dashboards: + + +Prometheus: http://146.190.55.222:30090 +Grafana: http://146.190.55.222:31519 + +### Access Prometheus +Use the below link as above to access the Prometheus UI + +[Prometheus-port](http://:) + +![img](./img/prometheus-ui.png.webp) + +![img](./img/prometheus-alerts.png.webp) + +### Access Grafana Default Dashboards +Use the below link as above to access the Grafana UI + + +[Grafana-port](http://:) + +![img](./img/grafana-ui.png.webp) + + +Use the below command to get the Grafana Admin login: + + +**Username:** + +```yaml +$ kubectl get secret --namespace default kube-prometheus-stack-grafana -o jsonpath="{.data.admin-user}" | base64 --decode ; echo +admin +``` +**Password:** +```yaml +$ kubectl get secret --namespace default kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo +prom-operator +``` + + + +![img](./img/grafana-dashboards.png.webp) + +### Default Dashboards + +By default our previous setup will add few dashboards: + +![img](./img/dashboard.png.webp) + +Using these dashboards we can easily monitor our kubernetes cluster + +![img](./img/pod-dashboard-example.png.webp) + +### Add/Create new Dashboards + +We also have the flexibility to create our own dashboards from scratch or import multiple Grafana dashboards from the Grafana library. + + +To import a Grafana dashboard, follow these steps: + + +**Step 1:** Access the Grafana library. + +![img](./img/grafana-library.png.webp) + +**Step 2.** Select the desired dashboard ID to add. + +Considering K8s/Storage/Volumes/Namespace Dashboard + +![img](./img/dashboard-k8s.png.webp) + + +![img](./img/dashboard-id.png.webp) + +Copy the Id of K8s/Storage/Volumes/Namespace Dashboard i.e., 11455 + +**Step 3: Import selected Dashboard in Grafana** + +Access Dashboard section & click on Import section. + +![img](./img/dashboard-in-grafana.png.webp) + +Now enter the ID of the target new Dashboard i.e., 11455. + +![img](./img/import-dashboard-11455.png.webp) + +Click on Load to load the new dashboard into Grafana. + +![img](./img/import-dashboard.png.webp) + +Click on Import to import the new Dashboard & Access it. + +![img](./img/dashboard-current-alerts.png.webp) + + +These steps allow us to easily integrate any dashboard from the Grafana library. + + +### Prometheus Architecture + +Prometheus is a powerful monitoring and alerting toolkit designed for reliability and scalability. Understanding its architecture helps in leveraging its full potential. The Prometheus architecture comprises several key components: + +![img](./img/animated-promethes-and-grafana.png.gif) + +### Prometheus Server + +The Prometheus server is the core component responsible for: + +1. **Data Scraping:** Prometheus periodically scrapes metrics from configured targets, which are typically HTTP endpoints exposing metrics in a specified format. +2. **Data Storage**: It stores all scraped samples locally using a time series database. Prometheus is designed to be efficient with both storage and retrieval of time series data. +3. **Querying:** Prometheus allows you to query the time series data via the Prometheus Query Language (PromQL), which enables complex aggregations and calculations. + +### Prometheus Components + +1. **Prometheus Server:** The main component that does the bulk of the work, including scraping metrics from targets, storing the data, and providing a powerful query interface. +2. **Pushgateway:** An intermediary service used for pushing metrics from short-lived jobs that cannot be scraped directly by Prometheus. This is particularly useful for batch jobs and other processes with a finite lifespan. +3. **Exporters:** Exporters are used to expose metrics from third-party systems as Prometheus metrics. For example, Node Exporter collects hardware and OS metrics from a node, while other exporters exist for databases, web servers, and more. +4. **Alertmanager:** This component handles alerts generated by the Prometheus server. It can deduplicate, group, and route alerts to various receivers such as email, Slack, PagerDuty, or other notification systems. +5. **Service Discovery:** Prometheus supports various service discovery mechanisms to automatically find targets to scrape. This includes static configuration, DNS-based service discovery, and integrations with cloud providers and orchestration systems like Kubernetes. +6. **PromQL:** The powerful query language used by Prometheus to retrieve and manipulate time series data. PromQL supports a wide range of operations such as arithmetic, aggregation, and filtering. + +### Data Flow in Prometheus + +1. **Scraping Metrics:** Prometheus scrapes metrics from HTTP endpoints (targets) at regular intervals. These targets can be predefined or discovered dynamically through service discovery. +2. **Storing Metrics:** Scraped metrics are stored as time series data, identified by a metric name and a set of key-value pairs (labels). +3. **Querying Metrics:** Users can query the stored metrics using PromQL. Queries can be executed via the Prometheus web UI, HTTP API, or integrated with Grafana for visualization. +4. **Alerting:** Based on predefined rules, Prometheus can evaluate metrics data and trigger alerts. These alerts are sent to Alertmanager, which then processes and routes them to the appropriate notification channels. + +### Example of Prometheus Workflow + +1. **Service Discovery:** Prometheus discovers targets to scrape metrics from using service discovery mechanisms. For example, in a Kubernetes environment, it discovers pods, services, and nodes. +2. **Scraping:** Prometheus scrapes metrics from discovered targets at defined intervals. Each target is an endpoint exposing metrics in a format Prometheus understands (typically plain text). +3. **Storing:** Scraped metrics are stored in Prometheus’s time series database, indexed by the metric name and labels. +4. **Querying:** Users can query the data using PromQL for analysis, visualization, or alerting purposes. +5. **Alerting:** When certain conditions are met (defined by alerting rules), Prometheus generates alerts and sends them to Alertmanager. +6. **Alertmanager:** Alertmanager processes the alerts, deduplicates them, groups them if necessary, and sends notifications to configured receivers. + +### Understanding the Kubernetes Objects + +The Helm chart deploys various Kubernetes objects to set up Prometheus and Grafana. +```yaml +$ kubectl get all +NAME READY STATUS RESTARTS AGE +pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 38m +pod/kube-prometheus-stack-grafana-76858ff8dd-76bn4 3/3 Running 0 38m +pod/kube-prometheus-stack-kube-state-metrics-84958579f9-g44sk 1/1 Running 0 38m +pod/kube-prometheus-stack-operator-554b777575-hgm8b 1/1 Running 0 38m +pod/kube-prometheus-stack-prometheus-node-exporter-cl98x 1/1 Running 0 38m +pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 38m + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 38m +service/kube-prometheus-stack-alertmanager ClusterIP 10.245.239.151 9093/TCP,8080/TCP 38m +service/kube-prometheus-stack-grafana NodePort 10.245.30.17 80:31519/TCP 38m +service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.245.26.205 8080/TCP 38m +service/kube-prometheus-stack-operator ClusterIP 10.245.19.171 443/TCP 38m +service/kube-prometheus-stack-prometheus NodePort 10.245.151.164 9090:30090/TCP,8080:32295/TCP 38m +service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.245.22.30 9100/TCP 38m +service/kubernetes ClusterIP 10.245.0.1 443/TCP 57d +service/prometheus-operated ClusterIP None 9090/TCP 38m + +NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE +daemonset.apps/kube-prometheus-stack-prometheus-node-exporter 1 1 1 1 1 kubernetes.io/os=linux 38m + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/kube-prometheus-stack-grafana 1/1 1 1 38m +deployment.apps/kube-prometheus-stack-kube-state-metrics 1/1 1 1 38m +deployment.apps/kube-prometheus-stack-operator 1/1 1 1 38m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/kube-prometheus-stack-grafana-76858ff8dd 1 1 1 38m +replicaset.apps/kube-prometheus-stack-kube-state-metrics-84958579f9 1 1 1 38m +replicaset.apps/kube-prometheus-stack-operator-554b777575 1 1 1 38m + +NAME READY AGE +statefulset.apps/alertmanager-kube-prometheus-stack-alertmanager 1/1 38m +statefulset.apps/prometheus-kube-prometheus-stack-prometheus 1/1 38m +``` + +Here’s a brief explanation of each type of object used: + +### Deployments: + +Deployments ensure that a specified number of pod replicas are running at any given time. They manage the creation, update, and deletion of pods. In this setup, Deployments are used for: + +**Grafana:** Manages the Grafana instance, ensuring it is always available. +**Kube-State-Metrics:** Exposes Kubernetes cluster-level metrics. + +Example: + +```yaml +deployment.apps/kube-prometheus-stack-grafana 1/1 1 1 38m +deployment.apps/kube-prometheus-stack-kube-state-metrics 1/1 1 1 38m +``` +### StatefulSets: + +StatefulSets are used for managing stateful applications that require persistent storage and stable network identities. They ensure that the pods are deployed in a specific order and have unique, stable identifiers. In this setup, StatefulSets are used for: + + +**Prometheus:** Ensures Prometheus instances have persistent storage for metric data. +**Alertmanager:** Manages the Alertmanager instances. + +Example: + +```yaml +statefulset.apps/alertmanager-kube-prometheus-stack-alertmanager 1/1 38m +statefulset.apps/prometheus-kube-prometheus-stack-prometheus 1/1 38m +``` +### DaemonSets: + +DaemonSets ensure that a copy of a pod is running on all (or some) nodes in the cluster. They are commonly used for logging and monitoring agents. In this setup, DaemonSets are used for: + +**Node Exporter:** Collects hardware and OS metrics from the nodes. + +Example: + +```yaml +daemonset.apps/kube-prometheus-stack-prometheus-node-exporter 1/1 38m +``` +### Cleanup Section +Use the below command to uninstall the prometheus stack + +```yaml +$ helm uninstall kube-prometheus-stack +``` + +### Advantages of Using Prometheus and Grafana + +Using Prometheus and Grafana together provides a powerful and flexible monitoring solution for Kubernetes clusters. Here are some of the key advantages: + +### Prometheus + +1. **Open Source and Community-Driven:** Prometheus is a widely adopted open-source monitoring solution with a large community, ensuring continuous improvements, support, and a plethora of plugins and integrations. +2. **Dimensional Data Model:** Prometheus uses a multi-dimensional data model with time series data identified by metric name and key/value pairs. This makes it highly flexible and powerful for querying. +3. **Powerful Query Language (PromQL):** Prometheus Query Language (PromQL) allows for complex queries and aggregations, making it easy to extract meaningful insights from the collected metrics. +4. **Efficient Storage:** Prometheus has an efficient storage engine designed for high performance and scalability. It uses a local time series database, making it fast and reliable. + + +5. **Alerting:** Prometheus has a built-in alerting system that allows you to define alerting rules based on metrics. Alerts can be sent to various receivers like email, Slack, or custom webhooks using the Alertmanager component. +6. **Service Discovery:** Prometheus supports multiple service discovery mechanisms, including Kubernetes, which makes it easy to dynamically discover and monitor new services as they are deployed. +### Grafana + +1. **Rich Visualization:** Grafana provides a wide range of visualization options, including graphs, charts, histograms, and heatmaps, allowing you to create comprehensive dashboards. +2. **Customizable Dashboards:** Grafana dashboards are highly customizable, enabling you to create tailored views that meet the specific needs of your team or organization. +3. **Integration with Multiple Data Sources:** While Grafana works seamlessly with Prometheus, it also supports many other data sources such as Elasticsearch, InfluxDB, and Graphite, making it a versatile tool for centralized monitoring. +4. **Alerting:** Grafana offers its own alerting system, allowing you to set up alert rules on dashboard panels and receive notifications via multiple channels, such as email, Slack, and PagerDuty. +5. **Templating:** Grafana allows the use of template variables in dashboards, making them reusable and more interactive. This feature helps in creating dynamic and flexible dashboards. +6. **User Management and Sharing:** Grafana supports user authentication and role-based access control, making it easier to manage access to dashboards. Dashboards can also be easily shared with team members or embedded in other applications. +7. **Plugins and Extensions:** Grafana has a rich ecosystem of plugins for different data sources, panels, and apps, allowing you to extend its functionality to meet your specific monitoring needs. +### Combined Benefits + +1. **Comprehensive Monitoring Solution:** Together, Prometheus and Grafana provide a complete monitoring solution, from metrics collection and storage (Prometheus) to powerful visualization and analysis (Grafana). +2. **Scalability:** Both Prometheus and Grafana are designed to scale with your infrastructure. Prometheus can handle millions of time series, while Grafana can manage numerous dashboards and data sources. +3. **Real-Time Monitoring and Alerting:** With Prometheus’s real-time metrics collection and Grafana’s real-time visualization, you can monitor your infrastructure’s health continuously and get alerted to issues promptly. +4. **Ease of Use:** Setting up Prometheus and Grafana is straightforward, especially with tools like Helm for Kubernetes, making it easy to deploy and manage the monitoring stack. +5. **Extensibility:** Both tools are highly extensible, allowing you to integrate them with other systems and customize them to fit your specific requirements. + + +By leveraging the strengths of Prometheus and Grafana, you can ensure that your Kubernetes environment is well-monitored, making it easier to maintain performance, reliability, and efficiency. +## Conclusion +Setting up Prometheus and Grafana on Kubernetes using Helm is straightforward and provides a powerful monitoring solution for your cluster. By exposing the services via NodePorts, you can easily access the dashboards from outside the cluster. This setup allows you to monitor your cluster’s performance, visualize metrics, and set up alerts to ensure your applications run smoothly. ---- -## Creating Dashboards -Grafana allows you to create custom dashboards to visualize metrics. Example steps: -1. Log in to Grafana. -2. Add Prometheus as a data source. -3. Create a new dashboard and add panels for specific metrics. -4. Use PromQL (Prometheus Query Language) to query metrics. - -Example PromQL queries: -- CPU usage of a pod: - ```promql - sum(rate(container_cpu_usage_seconds_total{pod="my-app-pod"}[5m])) - ``` -- Memory usage of a node: - ```promql - sum(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) - ``` -## Best Practices -- Use labels effectively in Prometheus to organize and query metrics. -- Set up retention policies to manage storage usage. -- Use Alertmanager to configure alerts for critical metrics. -- Monitor Prometheus and Grafana resource usage to ensure scalability. - ---- -Stay tuned for updates as we continue to enhance this guide! diff --git a/eks/eks-irsa/.DS_Store b/eks/eks-irsa/.DS_Store deleted file mode 100644 index 5008ddfc..00000000 Binary files a/eks/eks-irsa/.DS_Store and /dev/null differ diff --git a/eks/karpenter/images/.DS_Store b/eks/karpenter/images/.DS_Store deleted file mode 100644 index 5008ddfc..00000000 Binary files a/eks/karpenter/images/.DS_Store and /dev/null differ diff --git a/eks/pod-identity/.DS_Store b/eks/pod-identity/.DS_Store deleted file mode 100644 index 5008ddfc..00000000 Binary files a/eks/pod-identity/.DS_Store and /dev/null differ diff --git a/examples/.DS_Store b/examples/.DS_Store deleted file mode 100644 index 85958b3b..00000000 Binary files a/examples/.DS_Store and /dev/null differ diff --git a/examples/RBAC/.DS_Store b/examples/RBAC/.DS_Store deleted file mode 100644 index 079b3651..00000000 Binary files a/examples/RBAC/.DS_Store and /dev/null differ diff --git a/examples/pods/.DS_Store b/examples/pods/.DS_Store deleted file mode 100644 index 5008ddfc..00000000 Binary files a/examples/pods/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/.DS_Store b/examples/seleniumgrid/testing/java/.DS_Store deleted file mode 100644 index 3fa8a9c2..00000000 Binary files a/examples/seleniumgrid/testing/java/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/.DS_Store b/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/.DS_Store deleted file mode 100644 index a0174f51..00000000 Binary files a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/src/test/.DS_Store b/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/src/test/.DS_Store deleted file mode 100644 index 8ce869e8..00000000 Binary files a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/src/test/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/target/.DS_Store b/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/target/.DS_Store deleted file mode 100644 index ed370578..00000000 Binary files a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/target/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/target/surefire-reports/.DS_Store b/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/target/surefire-reports/.DS_Store deleted file mode 100644 index b8fcb36c..00000000 Binary files a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/target/surefire-reports/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/test-output/.DS_Store b/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/test-output/.DS_Store deleted file mode 100644 index 6669766f..00000000 Binary files a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/test-output/.DS_Store and /dev/null differ diff --git a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/test-output/old/.DS_Store b/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/test-output/old/.DS_Store deleted file mode 100644 index e11908c4..00000000 Binary files a/examples/seleniumgrid/testing/java/KubernetesEKS_SeleniumGrid_Output/test-output/old/.DS_Store and /dev/null differ diff --git a/examples/velero/helm-charts b/examples/velero/helm-charts deleted file mode 160000 index 7564a60a..00000000 --- a/examples/velero/helm-charts +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 7564a60a891199aff32dc4f34ca9f1e882e18e59 diff --git a/ingress/.DS_Store b/ingress/.DS_Store deleted file mode 100644 index 2a9b04d7..00000000 Binary files a/ingress/.DS_Store and /dev/null differ diff --git a/ingress/nginx-ingress/.DS_Store b/ingress/nginx-ingress/.DS_Store deleted file mode 100644 index 47285b19..00000000 Binary files a/ingress/nginx-ingress/.DS_Store and /dev/null differ diff --git a/ingress/nginx-ingress/hands-on/.DS_Store b/ingress/nginx-ingress/hands-on/.DS_Store deleted file mode 100644 index 8787931c..00000000 Binary files a/ingress/nginx-ingress/hands-on/.DS_Store and /dev/null differ diff --git a/monitoring/.DS_Store b/monitoring/.DS_Store deleted file mode 100644 index 9eec6520..00000000 Binary files a/monitoring/.DS_Store and /dev/null differ diff --git a/scaling/.DS_Store b/scaling/.DS_Store deleted file mode 100644 index 7e27ab66..00000000 Binary files a/scaling/.DS_Store and /dev/null differ