Elastic Stack Log Management

From Chaos to Control: How a Payment Solution Company Transformed Log Management with Elastic Stack

Written by Ashnik Team

| Oct 23, 2024

4 MIN READ

In the fast-paced world of digital payments, every second counts. For one of our clients, a leading payment solution company, the stakes couldn’t have been higher. Millions of transactions were happening every day, and behind the scenes, a storm was brewing. The company’s log management system, which was once a well-oiled machine, began to creak under the sheer volume of data. Logs flooded in at an overwhelming 25,000 events per second. And with each passing moment, the cracks grew larger.

Service uptime was being threatened. Fragmented data across disparate systems meant incidents were taking too long to identify, analyze, and resolve. A company that prided itself on reliability was beginning to experience delays and inefficiencies, which were unacceptable in an industry where seconds matter.

Something had to change.

The Challenge: A System on the Brink

The client was facing a perfect storm:

  1. Scalability Issues: Their existing infrastructure could no longer handle the relentless increase in log volume.
  2. Data Silos: Fragmentation of data across multiple systems made comprehensive analysis nearly impossible.
  3. Operational Inefficiencies: Delays in incident response were creeping into their operations, affecting service quality and putting compliance at risk.
  4. UDP Traffic Management: The use of UDP to handle logs posed an additional challenge due to its unreliable, connectionless nature. Ensuring that no log was lost in transit was critical.

For the IT team, this was more than a technical issue—it was a matter of survival. The company needed a system that could scale effortlessly, provide real-time insights, and maintain the high level of service their customers had come to expect.

The Turning Point: Finding the Right Solution

With the system at its breaking point, the payment solution company turned to Ashnik and the Elastic Stack. The decision wasn’t just about adopting new technology – it was about reclaiming control and ensuring the company’s future success.

The solution would need to address several critical areas:

  1. Reliability: The company couldn’t afford missed logs. Every event needed to be captured, processed, and analyzed in real-time – even with the challenges posed by UDP traffic.
  2. Scalability: The system had to handle rapidly increasing log volumes without breaking a sweat.
  3. Actionable Insights: The IT team needed the ability to quickly identify patterns, detect anomalies, and respond to incidents before they could escalate.

The Solution: Building a Scalable, Resilient System with Elastic Stack

To regain control over their data, Ashnik’s team designed a solution based on the Elastic Stack that ensured seamless log management, fault tolerance, and real-time analytics. Here’s how Ashnik’s architecture came together to tackle the challenges, especially managing UDP traffic:

blog

  1. VIP and Array Load Balancer:
    The architecture, started with an Array Load Balancer and Virtual IP (VIP), handling incoming UDP traffic. This setup ensured high availability and fault tolerance, as the VIP dynamically switched between active and passive servers, keeping the log ingestion flowing uninterrupted. Despite UDP’s inherent unreliability, Ashnik was able to guarantee log delivery by pairing it with a highly resilient infrastructure.
  2. Syslog-ng and Filebeat:
    Syslog-ng was configured by Ashnik to capture logs from the VIP with pinpoint accuracy, ensuring that every event—despite using UDP—was reliably written to disk. From there, Filebeat picked up the logs and forwarded them to Logstash for processing.
  3. Kubernetes for Orchestration:
    Deploying Logstash and Elasticsearch within a Kubernetes cluster, provided dynamic scaling, essential for handling fluctuating log ingestion rates. Kubernetes allowed the system to grow with the client’s data, ensuring scalability and fault tolerance without manual intervention.
  4. Logstash and Elasticsearch:
    Logstash, configured by the Ashnik team, parsed and sanitized the data, forwarding it to Elasticsearch for indexing. With its ability to handle up to 60,000 events per second, Elasticsearch enabled real-time analysis, offering instant insights into operational trends.
  5. Kibana for Actionable Insights:
    Finally, Kibana Dashboards, transformed raw log data into visually intuitive insights. The IT team could now detect anomalies, visualize trends, and respond to incidents before they impacted performance.

This flow allowed the payment solution company to transform its log management from a chaotic flood of data into a streamlined, scalable system capable of handling peak loads while providing actionable real-time insights.

Overcoming Roadblocks: Tweaking the System for Peak Performance

Like any great journey, this one wasn’t without its bumps. Along the way, we encountered some technical challenges that required fine-tuning:

  1. Managing UDP Traffic:
    One of the key challenges was ensuring reliable log delivery over UDP. The combination of Syslog-ng and VIP, implemented by Ashnik, ensured that we maintained 100% accuracy despite the inherent unreliability of UDP. Logs were captured reliably, written to disk, and processed without loss.
  2. Achieving 100% Log Delivery:
    While Logstash was a strong starting point, Ashnik integrated Syslog-ng to guarantee 100% log availability. This shift provided the client with the reliability they needed to stay ahead of potential issues.
  3. Simplifying Kubernetes Deployment:
    During the Kubernetes deployment, the Ashnik team weighed options between NodePort and Ingress Controller. Ultimately, the architecture was streamlined by using Kubernetes services, reducing complexity without sacrificing performance.
  4. Optimizing Storage for High Performance:
    Initially, the system used CEPH as the distributed storage solution. However, during performance testing (100k events per second), Ashnik identified a dip in performance and recommended a pivot to natively attached storage, which delivered the high-speed performance necessary to meet the client’s needs.

The Results: A Seamless, Scalable Future

After deploying the Elastic Stack with Ashnik’s expertise, the client saw dramatic improvements:

  • Performance and Scalability: The new system effortlessly handled high-volume log data, providing real-time insights that allowed the client to respond to incidents faster and with greater accuracy.
  • Operational Resilience: With Kubernetes managing dynamic scaling and fault tolerance, the system continued to operate smoothly even during peak loads, meeting the company’s stringent uptime and compliance requirements.
  • Cost Efficiency: By optimizing resource allocation and storage strategies, Ashnik helped the client achieve cost savings without compromising performance.

Conclusion: From Struggle to Success with Ashnik and Elastic Stack

What started as an overwhelming flood of log data was transformed into a smooth, scalable, and resilient log management system. With the Elastic Stack in place, designed and implemented by Ashnik, the payment solution company is now equipped to handle the challenges of tomorrow. Their systems are more reliable, their insights more actionable, and their operations more efficient.

For IT teams facing similar struggles, Elastic Stack isn’t just a tool—it’s a solution, and with Ashnik’s expertise, it brings clarity and control to even the most chaotic data environments.


Go to Top