Vehicle Continuum: Advanced Data Science Applications to Drive Business Intelligence

10 November 2018

The continuum captures events occurring in succession, ranging from past to the present and even into the future. It understands and predicts the behaviour of an entity and maps it to intelligent events. In RIVIGO’s context, Vehicle Continuum is business intelligence over GPS data.



Trucks and pilots (drivers) are the two most fundamental elements of any full-truck load business. Almost all objectives of driving operational efficiencies and business performance can be derived by optimizing and leveraging these two sources. At RIVIGO Labs, we make every decision, whether big or small, using data. For that to happen well, we need to build accurate data sources for these two core entities.



  • To create and maintain a universal source of truth for a vehicle’s life journey – accurate, reliable and available
  • To enable data-driven reporting for all vehicle-related KPIs and enable smart RCA when targets are not met
  • To create data inputs required from other services like ETA service, sales forecasting etc. based on past data. For example – detention, sectional TAT etc.
  • To maintain accurate nodes dataset since accuracy of Vehicle Continuum is directly proportional to accuracy of nodes


What is Vehicle Continuum?

The continuum captures events occurring in succession, ranging from past to the present and even into the future. It understands and predicts the behaviour of an entity and maps it to intelligent events. In RIVIGO’s context, Vehicle Continuum (VC) is business intelligence over GPS data.


Node identification

To make a continuum of a vehicle, we needed to know various nodes involved in the ecosystem. We categorized nodes into two categories – RIVIGO nodes and client nodes. RIVIGO nodes can be our pitstops (establishments that are each separated by roughly 300 kms where relay driver changeover happens), parking spots, fuel pumps, toll booths, borders, workshops etc., whereas client nodes can be warehouses, client parking spots etc. Identification of the correct set of toll booths, fuel pumps (which we use for filling), client warehouses etc. was a challenge. Since the inherent nature of these nodes are different, we had to come up with different logics for detection of different nodes.

Let’s consider toll booths dataset. We used NHAI data and fastag transaction details provided by the bank partner for RIVIGO vehicles. The idea was to use vehicle’s GPS location at the time of fastag transaction, but it was not so straightforward. This fastag transaction data stream had a delay of up to 15 minutes, during which a vehicle can go around 10 km away from toll booth. Therefore, we needed to filter out noise of stoppage points from the dataset of -15 min to +15 minutes of a fastag transaction. On this dataset, we ran density-based spatial clustering of applications with noise (DBSCAN) algorithm to identify the correct location of a toll booth. To test the accuracy, we used Google maps (Satellite view) to check if  the identified location looks like a toll booth. We found out that approximately 98% of toll booths were identified correctly with the help of DBSCAN algorithm. Fuel pumps were also located through fuel sensor data with the same clustering algorithm.



Vehicle continuum data is currently processed via batch jobs in three layers:


Chronos – Collects raw data

At RIVIGO, every vehicle is installed with several IOT sensors like GPS, fuel sensors, temperature sensors etc. Our proprietary pilot app also throws in data through GPS, Gyro sensors etc. Chronos layer collects data from these various sensors and stores it after basic sanity in MongoDB and generates necessary events. You can read more about this in our earlier post on IoT sensor data collection.

Athena – Processes raw data

Athena converts raw data streams into RUNNING, STOPPED or UNKNOWN legs. It also processes GPS data, fuel data etc. to attribute distance, time and fuel consumption to each leg.

Base Processing Layer

This layer takes the output of the Athena layer as an input and attaches trip context to this data. It also detects node, if any, for every stoppage. Node detection is a bit tricky. Nodes, as mentioned earlier, are categorized as client nodes and RIVIGO nodes. RIVIGO nodes can be identified anywhere in any time interval, but client nodes can be identified in the time relevant to the trip. Let’s look at the picture below:



Here, only C1 and C2’s nodes will be identified between two trips. We then use these detected nodes to predict loading/unloading, dry run to/from client warehouse.

At the beginning, we were using linear search to find the nearest neighbour node for a compressed data leg of raw data. However, as our dataset size started increasing, we needed to come up with a faster approach for node identification to reduce overall processing time. Hence, we switched to k-nearest neighbour algorithm for better performance. 

Advanced Processing Layer

This layer creates actual polished data required for all the analysis. It adds business intelligence over the data from previous layer in such a way that reporting of various KPIs can be easy. It also aggregates data from various microservices within our technology architecture like Fuel Desk, Ticketing system etc. after validating accuracy to a certain level and adding it to relevant legs.  Let’s go through some utilities of VC with a few examples:

  • Vehicle Detention: A major spillage of a vehicle’s time occurs at warehouses for loading/unloading or in waiting for the next trip. More detention time implies lesser vehicle utilization, which implies low revenue. Hence it becomes an important business KPI to measure i.e., which clients are taking higher time load/unload the vehicle? What locations are responsible for major wait for load detention? Vehicle detention is generally calculated based upon client warehouse (CWH) detection. But for trips starting from new CWHs, we often do not have correct location of CWH nodes. For such cases, we came up with a heuristic function which helps in detecting detention for new trips to a certain accuracy. We further extrapolated this CWH detention to nearby stoppages within 1 km radius after removing noise from the dataset to get more accurate results.
  • Dry run: Dry run is any vehicle movement which is not generating revenue. Dry run has three disadvantages. Firstly, it doesn’t generate any revenue and kills vehicle’s time. Secondly, it needs a pilot to move a vehicle. Thirdly, it consumes fuel which again is a wastage. In transportation, fuel contributes to one of the largest portions of overall cost. Hence it becomes important for to do an RCA based on which business vertical, location or client trip contributes to major dry run. VC provides data to analyse various dry runs. We analyse time between two client trips and read data from various data source including various microservices within our technology architecture to identify abnormalities and their root cause.
  • Stoppages during the trip: RIVIGO’s SLAs are the best in the industry. Hence it becomes important to measure and act upon all stoppages during a trip. There are a few planned stoppages during any trip like pitstop waiting for pilot changeover, fuelling stoppage etc. Unwanted stoppages are categorised as unscheduled stoppages (UNS). Lesser the UNS, more the adherence to plan. VC (along with Pilot continuum) helps in finding which pilots make more UNS, which route causes more traffic stoppages etc. VC, with the help of Ticketing service and smart node detection, identifies the stoppages which were not planned during the trip and are probable reasons for the stoppage. For example, it identifies traffic prone areas, tolls or vehicle breakdown etc. to know whether a stoppage was deliberate or unavoidable.

There are many more utilities of VC like mileage performance of a vehicle and pilot (driving behaviour) causing low mileage, mileage performance of a route etc.

Given such impactful use cases, VC is used to measure almost every vehicle’s KPI at RIVIGO.


Technology stack



  • Java(Spring), Mongo: Processing of raw data streams from IOT sensors is done using Java and data is stored in Mongo database. These sensors send data at a regular frequency and over time, raw data becomes huge in size and non-relational in nature. Hence, mongo best fits the requirements.

Omega 3, vitamina E e vitamina B1, presso qualsiasi indirizzo indicato o cittadini costretti a una caccia al tesoro da una farmacia all’altra, colleghi a rispolverare la prima versione della Manovra di dicembre. Ad alcuni cookie, per cui non esagerare con le ore di sonno di cui si tratta in questo.

  • Python and MySQL: Once raw data streams are converted into compressed data legs, we use Python (with Pandas and NumPy) to process it further (Layer 1 and Layer 2) and store this data in MySQL. MySQL gives good flexibility for easy reporting. Also, most of our data is relational and MySQL makes it easy and to conjoin other data sources, wherever needed.



For any data-driven problem, getting past a proper and sanitized source is a big win. The continuum delivers the facts. This source of truth is helping us drive insights that lead to intelligent business decisions every day. It is driving the future of intelligence and optimisation for us at RIVIGO.