Self-learning Predictive Engine for User Profile Creation and Order Fulfillment on Marketplace

7 August 2018

An overview of our self-learning recommendation engine which has been built to enable unparalleled ease of discovery for fleet suppliers, and thereby fasten the digitization journey of the Indian trucking ecosystem.



RIVIGO freight marketplace is digitizing Indian trucking by enabling a hassle-free discovery and transaction experience for shippers and carriers (truck suppliers/fleet owners). For any shipper, fast order fulfillment is the most critical expectation for good customer experience and fast product adoption. To meet this expectation, our supply side product (RIVIGO Fleet) where the truck suppliers / fleet owners can view and accept favorable orders, has been designed to facilitate unparalleled ease for carriers to find relevant orders.

To drive an effective recommendation feed for every supply user and to deliver fast fill TAT (turnaround time), the following information is required.

  • Supplier profile (lanes and vehicle types that the supplier operates in)
  • Extent of acceptable deviation from standard routes (dry/empty runs). For example, consider a fleet owner who operates from Ludhiana. If demand is not available at Ludhiana, is s/he open to go to Ambala and receive the load or not.



One of the key operating choices we’ve made is that we would not collect user profile information directly from the user. This was because of two reasons. Firstly, we wanted to create delight for our users by predicting their profile correctly. Secondly, user reported profiles are 90%+ times inflated as they are trying to position themselves for business growth. At the same time, we also needed to ensure that this choice was not made at the cost of low fill rate.

We defined an initial rule-based system basis user feedback to create an instantaneous profile at the time of app download using GPS, IP permissions and user engagement activities on the app like card view duration, card clicks, card accepts, searches etc.

As user activity increased beyond a certain threshold, advanced engines built using machine learning models replaced the original rule / heuristics model output since we had sufficient information to create user profiles with a high degree of confidence. The threshold limit for user activity is determined using another machine learning engine which takes into account model accuracy and false positives. For all such models, users who have done trips on the platform consistently were used as ‘seed’.

Our machine learning models predict user-level ODVTs (origin-destination-vehicle type). It also predicts the extent of route deviation any user will find acceptable. The theory for the deviation is inspired from Heisenberg’s Uncertainty Principle. User behavior shows that the total uncertainty that a user will choose to operate in is limited. This means that the total uncertainty across all sources have to be finite across dry run at source, dry run at destination, variance in truck type etc. These models also help us identify truck inventory signals from user behavior. We will cover more on this in another article soon.


Load Recommendation Engine Architecture

In the following sections we will explain the stages of load recommendation for the user.

Data collection

User activities on the app are captured to profile the user. These include search, accepts, clicks and views on a particular load and notifications. These activities are captured via our event handler system. Also, the ODVT-level data for a user captured by local field engagement teams (through our in-house sales effectiveness product call ‘Goal’) was used for comparisons and model testing.

RIVIGO Fleet (supply) app, RIVIGO Goal app and the Event Handler system push these events to SNS (Simple Notification Service), where queue (SQS – Simple Queuing Service) is subscribed to capture all events. We have a worker running which stores the events from the queue in the Mongo Database cluster in respective collections named after the SNS topic name. These events from the data warehouse are used by our recommendation system.

User Profile Update Worker

User profile update worker reads raw data directly from the data warehouse. This is deployed in an AWS EC2 instance. The flow of data in the worker is shown below.

The worker does the following regularly.

  • Extract features from the raw data
  • Train the models to predict user ODVTs for a supply user
  • Store trained models on S3 after checking the performance with the previously trained model
  • Store the predicted ODVTs of user to Mongo Database cluster

Mongo database cluster was used because it provides the following.

  • Schema-less design
  • Scalability in managing terabytes of data
  • Rapid replica set with high availability feature
  • Supporting of geo-spatial queries to get relevant users

Matching Probability Model Worker

User ODVT level data is read by the worker from the data warehouse. Features are extracted from the raw data to train a model for calculating probability of a user serving a load on a particular ODVT. This is deployed in an AWS EC2 instance. The flow of data in the worker is shown below.

The matching probability model worker does the following regularly.

  • Get the certainty score calculated for orders for all users (explained in the next section)
  • Use ODVT level features to train the model
  • Sort final model to S3 after comparing the performance of the models
  • Calculate the probability scores for input ODVTs from the user matching daemon

User Matching Daemon

This daemon reads directly from the queue (SQS) which is subscribed to a SNS topic where the load creation packets are pushed by the demand backend. The packets from the queue contain the information regarding the demand like source, destination, loading time, vehicle type required etc. For the demand, this daemon tries to match the relevant user based on a certainty score.

The process of matching is explained below.

  • The certainty score is calculated on the basis of user profile (calculated ODVTs from user profile update worker)
  • The certainty score is used to filter the relevant ODVTs for calculating the probability of a user serving the particular order
  • The certainty scores and the probability calculated for each user based on different factors are stored in Mongo database cluster and MYSQL database
  • Mongo database is used for auditing and for storing detailed information for the calculated scores, whereas, MYSQL is used to serve real-time requests from the RIVIGO Fleet app. The audited data helps in analyzing and improving the scoring mechanism.
  • Data needed to calculate the scores is stored in different types of databases. The use case for each of those is explained below.
  • Mongo: Used for storing profile and helps in geospatial query to check for nearby lane users for consideration of score calculation
  • Memory (RAM): Recent searches and the trips data cached in memory allows for saving network call for each load. These are updated at a fixed time interval. 

Recommendation API and Notification

The backend services (Rivigo Fleet) hit to get recommended load for a user. We have hosted this service in AWS Elastic Beanstalk. Load balancers are used to balance loads in the servers.

For a user, active loads are sorted in descending order of serving probability and returned.

Notification Service

For every load created, top users are notified about the availability of the load via the RIVIGO Fleet backend. A notification packet is pushed to the topic (SNS) where a queue is subscribed. Based on the information from the packet, particular information (load source destination and loading time) is communicated to the relevant users.



We measure the effectiveness of the recommendation system by measuring what proportion of orders in the system converted to transactions through matchmaking. In the graph below, one can see the clear trend line of increasing % of transactions.

We want our engine to learn in complete autonomy. As the number of transactions increase, we are continuously innovating in machine learning and features formulation to further improve the performance and fasten the digitization journey of the Indian trucking ecosystem.


Shiv Shankar Verma and Bharathi Rajan from RIVIGO’s Data Science Team have also contributed to this article.