An overview of how we built our own ‘Elasticsearch as a Service’ to power all site search and centralize logging elasticsearch cluster.
At Rivigo, multiple applications are using Elasticsearch as a core infrastructure engine to solve numerous problems like centralized logging infrastructure, search capability in applications, storing consignment and audit logs time series data. All these use cases involve querying/searching, structured as well as unstructured data storage on specific keywords and getting results with lower latency. Hence, elasticsearch has proved to be very promising for such use cases.
Why Elasticsearch as a Service?
Previously, we were building our POC cluster manually but considering that the elasticsearch cluster architecture may change basis use-case and team, we would have ended up doing heavy ops work in creating elasticsearch clusters repeatedly.
Considering this and to reduce that drudgery, we started looking at options like AWS Elasticsearch Service. Though the AWS Elasticsearch Service was great- easy to deploy, operate and scale, we soon realised its shortcomings in our context and decided to move away from it to build our own Elasticsearch as a Service. There were several reasons for us to discard AWS and build our own Elasticsearch as a service, some of them were:
How we built our own Elasticsearch as a Service:
For building our own elasticsearch service we evaluated Terraform and CloudFormation. Considering the possible destructive nature (a single change resulting in probable loss of entire data) of Terraform, we chose to build our Elasticsearch service on CloudFormation. We are currently in the process of finding a solid solution towards controlled data management, once we crack that, we would move to Terraform as our tool of choice for our Elasticsearch service.
During the initial phase of building the elasticsearch service, we were writing yml CloudFormation templates but soon realized it was not very productive as it resulted in us losing a lot of time. Since we realized this early on, we started researching to look for better alternatives. It was then we came across a python library “troposphere” that could be used to generate CloudFormation templates just by being more pythonic. It was a gamechanger. Just a few lines of code in python and we could generate templates in json or yml.
One of the constraints of using CloudFormation templates is that it doesn’t support dynamic requirements (whether a dedicated master node is needed or not, if needed how many, single node elasticsearch cluster for dev’s or multi-node clusters) but we wanted a command to deploy any type of requirement. It was here that the shift to python played a pivotal role, as we wrote code in python in such a way that any type of cluster can be created on a single command.
We were not looking for CloudFormation just to launch a bunch of servers with certain configurations but rather wanted to create a complete stack from launching servers to elasticsearch cluster deployment. This was made possible due to an easy integration of Cloud-Init into the template using CFN Init.
Once we were ready with the code and it was time to move it to production and most importantly, test. After continuously battle testing the code numerous times, we launched our production cluster, which is the backbone of our centralized logging.
The Final Architecture:
It was crucial for building centralize logging architecture to keep high availability and scalability in mind. After a lot of discussions and our past experiences, we decided to build the Hot Warm Architecture. The architecture proposes to keep 2 types of nodes in the elasticsearch cluster (tagging nodes using node tag attributes):
Below is sample elasticsearch.yml which decides whether the node is hot or warm.
In our efforts to take higher availability and scalability in consideration while building the architecture, we broke our cluster into more scalable components and made it production ready.
Elasticsearch index lifecycle management:
The last piece of work was to make lifecycle management of indices in elasticsearch cluster. This was done by keeping into account the following things: