Custom Alerting Solutions with Prometheus and CTO.ai

Prometheus, an open-source system monitoring and alerting toolkit, can be a cornerstone in setting up a robust alerting mechanism to facilitate this. In this blog, we will delve deep into creating custom alerting solutions using Prometheus and CTO.ai, ensuring that you have an efficient alert system that notifies you of potential issues before they escalate into more serious problems.

Prerequisites

Before we dive into this blog, ensure you have the following setups:

  • CTO.ai account
  • Helm installed in your K8s cluster
  • Prometheus and Grafana are installed in your K8s cluster
  • Kubernetes Cluster: A functioning Kubernetes cluster, set up and configured with kubectl.
  • Helm: Helm package manager installed in your system.

Setting up CTO.ai EKS Workflow

Start by connecting your GitHub account and installing the EKS EC2 ASG workflow we support,  and choose the repository where your Kubernetes configurations are stored.

git clone git@github.com:workflows-sh/aws-eks-ec2-asg-cdk.git

cd aws-eks-ec2-asg-cdk

Next, set up and add your secret keys, AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_ACCOUNT_NUMBER, and GITHUB_TOKEN with write permissions to the project secret settings in CTO.ai.

After cloning the repo from GitHub, run and build your Workflow using ops build -b . and deploy your infrastructure to AWS.

  • Enter the name of your environment. You can use dev as the name of your environment. You can also use Prod or Stage, depending on what you want.
  • Your workflow will start deploying and creating your resources on AWS using CloudFormation
  • After deploying your AWS EC2 and Elastic Kubernetes workflow, you can see your stack directly on AWS CloudFormation. In your CloudFormation Stack, you can see your AWS resources created: Dev-AWS-EKS-ASG-Provider, AWS-EKS-EC2-ASG Resource, Dev-AWS-EKS-EC2-ASG, Sample-App-AWS-EKS, CDKToolkit, Dev-Sample-App-AWS-EKS-EC2-ASG.

Installing Prometheus Operator

Installing the Prometheus Operator using Helm is a straightforward and manageable method. Begin by adding the Prometheus community Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts


helm repo update

Next, create a values.yaml file to hold your custom configurations and install the Prometheus Operator:

helm install prometheus prometheus-community/prometheus

  • Back in your terminal, you can view your Prometheus pods using the kubectl get pods command.
  • Prometheus is now accessible for your workloads. Next, get your service using kubectl get svc so you can access your Prometheus server from your localhost web UI.
  • Port-Forward into your service using kubectl port-forward service/prometheus-server 9090:80

Building Custom Alerting Solutions with Prometheus

In Prometheus, alerting rules are defined in the Prometheus configuration file (a YAML file). The alerting rules are defined based on the Prometheus expression language expressions.

  • To create an alerting rule, you need to define the alert conditions and the actions to be taken when the alert conditions are met. You can use PromQL to define the conditions that will trigger the alert. For instance, you can create a rule to alert when the average CPU usage goes beyond 75% for a period of 5 minutes:
groups:
- name: example
  rules:
  - alert: HighCPUUsage
    expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 75
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High CPU usage"
      description: "CPU usage is above 75% for more than 5 minutes"
  • You can also view the status of your resources like rules, targets, service discovery, TSDB status, runtime, and build information.
  • Alert Manager handles the alerts sent by Prometheus server and takes care of grouping, inhibition, and sending out notifications through various channels. Set up routes and receivers in Alertmanager to manage alerts efficiently:
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://webhook.url'

Conclusion

Crafting custom alerting solutions with Prometheus and CTO.ai is a strategic approach in safeguarding your production environments. It not only allows you to detect and address issues promptly but also grants you the flexibility to tailor your alerting solution to meet your specific needs, creating a more resilient and responsive monitoring system.

Ready to unlock the power of CTO.ai for your team? Schedule your consultation now with one of our experts today!