Table of Contents
When deploying your Elastic Stack, it is key to consider the sizing requirements meant for your particular use case. Correct sizing is crucial for best performance and functionality – It ensures that your deployment can handle the volume of data you need to index and search, as well as provide the necessary performance and availability to meet your requirements. Recently, our team successfully completed several challenging implementations using Elastic Stack and I would like to share some key considerations based on my experience in this blog
Determining the optimal size of Elastic cluster was one of the biggest challenges we encountered. The response to this question is never straightforward and often depends on several considerations. There are many factors that need to be considered when sizing and architecting an Elastic cluster. Typically, I begin by asking my customers a set of determining questions before deploying the cluster, have outlined them below:
1. What is the use case?
Because Elastic Stack can be used for a wide range of use cases, from log analysis to security analytics. Each use case will have its own unique requirements and will require a tailored architecture.
2. What are your sources of data?
Metric data through beats OR log file data through filebeat OR RDBMS data OR Application logs OR data from the web, and others.
3. What is the frequency of data getting ingested into cluster?
Companies receive continuous data from multiple devices, files, streams etc.
4. Do you know the High, Low, Average volume of data ingestion?
To tune Elasticsearch performance for different volumes of data ingestion, it’s important to monitor the system’s resource utilization, adjust the indexing settings and hardware configuration as needed. Implementing proper monitoring and logging can help identify performance bottlenecks and optimize system performance over time.
5. What is the output you are expecting from Elastic cluster?
Search OR Dashboards OR Alerts OR Machine Learning OR Analytics, etc.
6. What performance is acceptable to you to be considered as an output?
Real-time in a sub-second OR under a minute OR a few minutes
7. How many users are going to access the Elastic Cluster?
Designing an Elasticsearch cluster to support many users requires careful consideration of a variety of factors, including hardware and resource requirements, query and indexing performance, security and access controls, and monitoring and analytics. With the right planning and configuration, Elasticsearch can become a powerful and scalable solution concurrently supporting many users.
8. What time frame of data archiving are you looking at?
7 days OR 1 month OR 6 months
9. Do you know the size of each document/event/records that are getting ingested into Elastic?
In Elasticsearch, the ingestion rate refers to the rate at which data can be indexed or ingested into the system. This is an important metric to consider when designing and scaling Elasticsearch clusters, as it can impact the overall performance and responsiveness of the system.
10. Data volume?
Since the amount of data you need to process, and store will have a significant impact on the size and configuration of your Elastic Stack. You need to consider both the size of your data sets and the rate at which they are generated.
11. Data retention for a number of days or months?
Because retaining data for a longer period will also impact your sizing and architecture decisions. The longer you need to keep your data, the more storage and processing power you will require.
12. Query complexity level?
The complexity of your queries impacts the amount of processing power you require. If you are performing complex aggregations or searching across multiple fields, you will need a more powerful system.
13. What are the availability requirements for your cluster?
If high availability and failover capabilities are required, the architecture design changes accordingly. This may involve setting up multiple Elasticsearch nodes, using load balancers, and configuring automatic failover.
14. How much budget is considered for the implementation?
The budget plays a factor in your sizing and architecture decisions. You will need to balance your requirements against the cost of hardware, software licenses, and maintenance.
15. Are you aware of Computing resource basics?
Performance is contingent on how you’re using Elasticsearch, as well as what you’re running it on, you should review some fundamentals around computing resources. For each search or indexing operation the following resources are involved:
- Storage: Where data persists
SSDs are recommended whenever possible, for nodes running search and index operations. Due to the higher cost of SSD storage, a hot-warm architecture is recommended to reduce expenses. When operating on bare metal, a local disk is the best option. Elasticsearch does not need redundant storage (RAID 1/5/10 is not necessary), logging and metrics use cases typically have at least one replica shard, which is the minimum to ensure fault tolerance while minimizing the number of writes.
- Memory: Where data is buffered
JVM Heap Stores metadata about the cluster, indices, shards, segments, and field data. This is ideally set to 50% of available RAM. Elasticsearch will use the remainder of available memory as OS Cache to cache data, improving performance dramatically by avoiding disk reads during full-text search, aggregations on doc values, and sorts.
- Compute: Where data is processed
Elasticsearch nodes have thread pools and thread queues that use the available compute resources. The quantity and performance of CPU cores governs the average speed and peak throughput of data operations in Elasticsearch.
- Network: Where data is transferred
The network performance — both bandwidth and latency — can have an impact on the inter-node communication and inter-cluster features like cross-cluster search and cross-cluster replication.
We have observed that many times the information for the above questions is not available at the start of the project. And if you start the implementation without having clarity on the above data, then the chances of facing a bottleneck once the data starts flowing into the Elastic cluster are higher.
Sizing your Elasticsearch nodes
In terms of architecture, the Elastic Stack is typically deployed in a distributed fashion. Elasticsearch nodes can be clustered together for increased performance and availability, and Logstash and Beats can be used to collect and process data from multiple sources. Kibana is used for visualization and dashboarding.
When it comes to sizing your Elasticsearch nodes, the number of shards a data node can hold is proportional to the node’s heap memory. For example, a node with 30GB of heap memory should have at most 600 shards. The further below this limit you can keep your nodes, the better. If you find your nodes exceeding more than 20 shards per GB, consider adding another node. You will also need to ensure that you have enough storage to handle your data sets, considering any replication and backup requirements.
Deciding on your archiving plan
Elasticsearch is renowned for its real-time data search and analytics capabilities, with users typically expecting results within sub-seconds. However, achieving this requires proper planning. When dealing with freshly ingested data, it’s reasonable to expect real-time results for the past hour or day. Older data should be archived either to another slower system or deleted altogether if it’s no longer required. Deciding on an archiving policy for Elasticsearch data should be one of the first steps in any project. Elasticsearch parameters can be utilized to construct a HOT-WARM architecture, where critical fresh data is stored on HOT nodes and less important or slower data is stored on WARM nodes. It’s crucial to identify which data is critical for sub-second/second search and which data can be searched on a slower platform.
There are many parameters that go into designing an Elastic cluster and you need to consider the above things carefully to start deploying the Elastic cluster.
Overall, sizing and architecture for the Elastic Stack will depend on your specific requirements and use case. It’s important to carefully consider these factors to ensure that your system is both performant and cost-effective.