configure highly available elasticsearch

How to configure High available Elasticsearch?

Written by Tanu Rawat

| Mar 16, 2021

3 MIN READ

Table of Contents

The very Elasticsearch tagline says, “You know for search (and analysis)”. It gives you the near real-time search experience and analysis of your data. It is fast, distributed, scalable and resilient. Flip through the testimonials and you can guess the power it delivers and the peace it offers in your data being highly available i.e available all the time (well..uh..almost). Sometimes your cluster may experience hardware failure or a power loss. To help you plan for this, Elasticsearch offers several features to achieve high availability despite failures.
One of the most powerful features is Cross Cluster Replication which lets you replicate data to a remote follower cluster which may be in a different data centre or even on a different continent from the leader cluster. The follower cluster acts as a hot standby, ready for you to failover in the event of a disaster so severe that the leader cluster fails. It is a part of xpack suite which is available with a trial license for 30 days and available in platinum and Enterprise subscriptions of Elastic.
Elastic search
You create a follower cluster as a Remote cluster in the leader Cluster. Work through API calls or Kibana Console, the set up as smooth as a glide.
You can find detailed setup of CCR information here.
Cross-cluster replication uses a pub-sub (Publish/subscribe) model. Your index to a leader index and the data is replicated to one or more read-only follower indices. The remote follower cluster must ideally be located at a distanced geographic location from the leader cluster.
CCR is mainly implemented for High availability and disaster recovery. And fortunately, I had the opportunity to work with a leading Bank Project where CCR was configured for both. We had configured for High availability and steps for Disaster recovery were neatly surmised and I am here to give you pointers for the same.
Supposing I have a data centre A. I would call this my Leader Cluster. The is the centre where all microservices push data and users fire search on. I want to never miss a beat and ensure high availability for the users.
So, I have another Data centre B that follows Data Center A and call it my Follower cluster. This cluster follows the inserts, updates, and deletes of records from the Leader.
Elastic Search

Failover

The LTM directs all traffic to cluster A by default. In case of power failure or cluster down situation, I stop the traffic and perform the below steps to redirect traffic to Cluster B.

  • 1. Close the follower index.
    POST follower index/_close
  • 2. Pause the following of leader then unfollow
    POST follower index /_ccr/pause_follow
    POST follower index /_ccr/unfollow

Once you close, pause, and unfollow, the document count in the follower index becomes null.

  • 3. Open the index. Once you open the index, it becomes writable and all previous docs become available.
    POST follower index/_open

Now I can configure my LTM to direct traffic to cluster B.
Meanwhile, we investigate the issues of cluster A faces. Users can search for real-time and data from Microservices is now being consumed with almost zero downtime. Yay…

Failback

But it is not over yet. I cannot let Cluster B consuming data forever. It is my backup space and not the consuming hub. So, I fix the Leader cluster asap once it is online and now plan to recourse to my original setting.
Here is how:

  • 1. Stop the online traffic and make Cluster B index leader and A index follower. It will be a new index (you can rename your original leader index and delete it thereafter).
  • 2. Once docs in A to B sync, make Cluster A as leader and Cluster B as follower again
  • 3. Start the traffic to Cluster A.

So here it is. Simple and no fuss.


Go to Top