RIVIGO’s ‘Everything Continuous’ Architecture: Integrate-Deploy-Monitor-Secure

21 July 2018
Technology

An overview of RIVIGO’s ‘everything continuous’ architecture that enables swift feature development and production deployments while ensuring continuous security and monitoring.

 

Introduction

For any organization that is dynamic and aspires to build agile engineering practices, developing a framework that enables swift feature development and production deployments becomes increasingly important. While the primary objective of such a framework is to enable fast-paced releases, it should also take care of near real-time monitoring and alerting. Further, since compromise in security is often an unintended cost of rapid product development, the need to embed security as a continuous integration step in release cycles is also critical.

 

The ‘everything continuous’ architecture

Security, integration, deployment and monitoring are the four pillars of an architecture which enables continuous releases. The diagram below gives a basic overview of each one of them.

There are several compelling reasons why this architecture is important. Some of these are listed below.

  • Ability to spin multi-test environments on docker without linear increase in cost
  • Artifacts get merged after high quality control gates and complete automation testing of features, performance and security is possible
  • Artifacts and releases are promoted to test-staging-preprod and prod through a controlled automated pipeline
  • Any release in preprod and prod gets monitored in the monitoring framework in near real time and gives quick insights about the ongoing deployment and result of the deployment
  • More control over the deployment with quick roll back triggers provisioned on Jenkins which can be grown as a control loop by integrating it with Service Health Scoring Service (more details on this here)
  • Security of services and infrastructure continuously gets monitored in all environments and smart alerts and decisions can be taken quickly to reduce the probability of security incidences.
  • If something is about to break in production, lets it fail first in test, staging or preprod environments
  • Provision to alter deployment strategies as and when needed as the mechanics of this architecture are governed by the service auto discovery and self-healing capabilities of the infrastructure

 

Continuous Security

The importance of creating a robust and continuous security framework cannot be emphasized. The comic strip below perfectly summarizes the situation nobody would want to be in.

Security should be a culture and DevOps should not extend themselves to SecDevOps. Security should be silent, agreed and deeply embedded in the way of building products. Continuous security basically means taking a pro-active approach vs. a reactive approach. When security is reactive, incident creation increases which not only leads to delay but also poses a significant threat to overall security and credibility. With continuous security we hit the security nail right in the head by introducing security checks at application and infrastructure level in the delivery pipeline. The continuous security testing framework runs 24*7 to identify any security breaches and misses.

 

Continuous integration and deployment (CI/CD)

Smartly designed CI/CD pipelines not only enable fast paced development and introduce automated security and quality gates but also improve developer and QA efficiency manifold. This aspect of engineering also manifests in cost saving in terms of hardware cost, engineering efficiency and rare downtimes.

A good CI/CD pipeline is like a state of the art manufacturing plant assembly line. It is fully automated, increases quality, eliminates bad artifacts and minimizes wastage. You can get a high-level view of the CI-CD pipeline of RIVIGO here.

Below are the major components of the CI/CD pipeline.

  • JIRA: For tracking tasks and stories while maintaining mapping between stories and their sub-tasks. Stories and sub-tasks in JIRA are associated with unique JIRA IDs and can be optionally used to create repos and branches in bitbucket.
  • Bitbucket: For backward mapping with JIRA tasks and to have multi-level code reviews, granular access control and trigger builds on Jenkins on any action taken on any branch of a repo
  • Jenkins: Build, integrate and deploy to AWS
  • AWS ECS: Containers are launched whenever an artifact, commit or merge needs to be tested
  • AWS ECR: Artifacts are stored and promoted from feature -> story -> staging-pre-prod -> prod

The diagram below shows how code gets integrated, promoted and deployed.

The design principles involved here are given below.

  • Full automated integration and deployment
  • Ability to spin multi-environments without linear increase in cost
  • Automatic discovery of services
  • Automatic healing of services
  • All components of the infrastructure are deployed in HA and failover

The diagram below gives an overview of the Infrastructure where code gets automatically tested and deployed.

The following workflow describes how things work in this service discovery architecture.

  • Each task has a unique JIRA_ID
  • Each repo or branch has an associated JIRA ID
  • Every developer and QA has a JIRA ID
  • For a developer or QA, the environment gets setup as per his/her JIRA ID. For example, a service called a.rivigo.com will be setup as JIRA_ID-a.rivigo.com so that each developer and QA would have their own isolated environment to test and iterate
  • Each service can be spanned numerous times using different JIRA IDs for different individuals
  • The service gets registered to consul, private DNS and NGINX gets reloaded as per the latest services present in the ECS cluster, which happens automatically without any manual interventions
  • The services with JIRA_ID appended endpoints are now available to be used inside the virtual private network of RIVIGO
  • After testing the environment associated with a particular JIRA_ID, it gets teared down automatically, resources are freed, and services are deregistered from consul, NGINX and private DNS
  • Load Balancing is configured through consul template onto NGINX and all containers with a particular JIRA_ID are added as upstreams
  • Update strategy (canary, rolling etc.) are made possible by changing weights of the containers getting registered in NGINX
  • Service A reaches Service B by discovering Service B’s endpoint through the combination of Private DNS and Nginx (private DNS and Nginx are populated with Service B’s endpoint automatically using consul-template and Jenkins)
  • This integrate-deploy cycle is continued till the build is fit to be promoted to prod and all functional and non-functional test cases are satisfied

 

Continuous Monitoring

The phase from deployment to production is critical as it can be full of uncertainties. The idea behind continuous monitoring is to ensure visibility of the deployed applications with critical insights about performance. Tracking key metrics about the application and infrastructure as the build is promoted from staging to pre-prod to prod along with a neat 360-degree view enables engineering teams to be informed very early about the deployment status and health of their newly deployed application.

The diagram below depicts what continuous monitoring looks like.

Following are some of the key compelling advantages of continuous monitoring.

  • In canary deployments when a fraction of traffic is forwarded to production, it becomes very crucial to have continuous and near real-time monitoring setup in place. A traffic shift of 10-20-n percent to production should give us quick insights about how the newer version of application is expected to perform in full load conditions.
  • It gives a 360-degree view at any time of resources, services, latency, DB connections etc. for a particular service thereby helping with a timely diagnosis should a particular service face any issue
  • The prod may have log level configured to a lower value but as the continuous monitoring setup is configured exactly in the same way in staging, pre-prod and prod with full blown logs in staging and pre-prod, it becomes very difficult for bugs to creep into production without being detected in lower environments
  • A production deployment is not considered a success unless the continuous monitoring systems gives consistent results for 10-20-50-n percent of traffic thereby ensuring a high bar on quality

 

Conclusion

With so many advances in available tools and methodologies, it’s an exciting time to be working in the area of release engineering. Continuous release cycles eliminate the need for hotfixes, allow better support for distributed engineering teams and create the necessary foundation to develop the next generation of tools, automation and processes necessary to allow an organization to scale.

 

Parijat Rathore, Director of Engineering at RIVIGO, has also contributed to this article.