Dynamic Prediction of Trucking Freight Prices to Digitize the Indian Trucking Economy (Part 1)

30 May 2018
Technology

Our overall approach and architecture of the Freight Pricing Engine, which has been conceptualized to make the process of price discovery in the trucking marketplace hassle-free and transparent.

 

Introduction

‘RIVIGO Freight’ is a data and technology driven freight marketplace that aims to digitize trucking, making it exceedingly simple with everything available at a single click.

Pricing is one of the biggest enablers for the success of any marketplace. Yet there is negligible price transparency in the trucking ecosystem today. In addition to this, the process of price discovery and finalization is also very tedious for the customer given the low-level of technology adoption and complexities within the sector.

Frieght Rate Exchange was conceptualized to make the process of price discovery hassle-free and transparent. We will be covering this concept through a series of articles. In this first article, we will introduce our overall approach and specifically cover the architecture of the Freight Pricing Engine.

In order to understand this series well, it would be good to know the basic terminology in the logistics marketplace sector. Some such important terms are:

  • Demand – Customer order for material movement from origin (O) to destination (D)
  • Supply – Availability of trucks with the truck suppliers
  • Vehicle type – Classifications are typically made with regard to body type, length and carrying capacity of a truck
  • ODVT – A combination of origin source (O), destination (D) and vehicle type (VT)

Further, there are multiple factors that affect the pricing of trucks. The dependence on such variables makes this a pretty difficult problem to crack. Some of these factors are:

  • Trip distance
  • Material type and weight of material transported as more expensive goods such as electronics, or over dimensional goods are shipped at a premium
  • Vehicle type
  • Truck demand and supply at origin and destination, which is typically influenced by factors like annual demand, seasonality, cyclicity etc.
  • Number of loading/unloading points as the trip duration is linked to this
  • Delivery TAT (turnaround time)

 

Approach

When we started our work on this project, we had a static list of N source and destination pairs with price points for all truck types. This translated to over 1 million data points.

Most of this data was created using basic market research and extrapolation logic in between lanes. A web and mobile interface for the business team enabled them to report prices for any ODVT real-time. Basis certain empirical formulae established through market research, every newly collected price was used to predict prices for related ODVTs. These formulae were codified into multiple heuristics-based rule books to make the system update prices real-time basis any new input.

Freight Rate Exchange was created to enable easy and reliable price discovery for users. It was created to also enable crowd-sourcing of price inputs directly from business users, especially for illiquid lanes where business volumes are finite but varying. Given that reliability of inputs was a very important aspect in the system, a ‘Reliability Engine’ was created. We provided reliability rating to each user per ODVT using machine learning to determine whether to incorporate a new price input into our system or to reject it. This is important in the initial stages of a marketplace when liquidity or critical mass in terms of users and activity on platform is limited.

Once data liquidity and user footfall on the platform increased, a system was created to auto-update lane prices. This was done using historical price data and corresponding user traffic on our platform to determine market velocity (demand-supply equilibrium state). This system is named after the great Indian mathematician, Srinivasa Ramanujan. His achievements in Mathematics inspire the work on our pricing engine.

The Ramanujan System uses machine learning to train multiple models to predict prices on similar lanes. It uses various factors and data points captured from our application and historical data to do so. These models compete amongst each other basis the ‘Reliability Engine’ scores and result in dynamically updated prices on the system.

We will share more details about the Ramanujan System and its applications in our upcoming posts.

 

Snapshot: Published rates for multiple lanes

 

Freight Pricing Engine Architecture

The diagram below shows how data flows in between various components of pricing engine.

The description of the stages below will explain how we calculate prices on lanes.

Data Collection

User activities are critical data points for us. These include things like how much time they spend on monitoring the price on the platform, how many times they quote on a particular ODVT etc. These user activities are captured via the Event Handler system.

Event Handler system pushes these events to SNS (Simple Notification Service), where queues (SQS – Simple Queuing Service) are subscribed. We have a daemon worker running which stores the events from the queue in the Mongo Database cluster.

Other events like load posted, load accepted etc. are captured by backend systems (supply and demand) and saved directly in our MySQL Database clusters.

These database sources are used by our pricing engine as input sources.

Price Update Collecting Worker

Price update collecting worker reads directly from Queues (SQS) which is subscribed to different SNS Topics. This is deployed in an AWS EC2 instance. The main purpose of update collecting worker is to:

  • Integrate the data from the Queue
  • Transform the data into a consistent format so that downstream systems can use it easily
  • Store the data in a Mongo Database Cluster

We used Mongo database cluster because it provides:

  • Schema-less design
  • Scalability in managing Terabytes of data
  • Rapid replica set with high availability feature

Besides using Mongo database clusters, we also use a PostgreSQL database cluster for storing lane and location information. We have stored lane information in PostgreSQL because of PostGIS, which enables fast querying on geospatial data.

Update collecting worker also adds data related to lane information using (src lat, src lon, dst lat, dst lon) combination.

Derived Price Updates Generating Worker

Derived price updates generating worker is the system which generates price updates for linked ODVTs from a single ODVT update. This is deployed in an AWS EC2 instance.

This worker reads updates from the mongo cluster, creates new lanes, calculates reliability of the updates, derives related price updates and saves price snapshots in AWS S3 at regular intervals, which are fetched at booting time for speeding up the initialization of the worker. Data related to the recent price points and user reliability is stored in Redis. This enables real-time reliability calculations.

Pricing Engine API

Pricing Engine API is the server which backend services (supply and demand backend) hit to get prices of a particular ODVT. We have hosted this service in AWS Elastic Beanstalk. Load balancers are used to balance loads in the servers.

Pricing Engine API connects with the following systems:

  • PostgreSQL Database Cluster: This is used to find the nearest lane data from (src lat, src lon, dst lat, dst lon).
  • Redis Database Cluster: This saves the data related to load posts on our system. We save a <price_reference_id, price> in redis cluster to reduce latency of requests coming from the supply app.
  • Mongo Database Cluster: A separate thread runs in the background to update prices of lanes in real time using updates added by ‘derived updates generating worker’ in Mongo database cluster. Using this, the pricing engine API syncs itself regularly from the inflowing price updates.
  • Downloading price snapshot in the memory for faster API hits: An alternate was to save it in the Redis cluster, but it was throttled at around 300 TPS, which would have resulted in slower response time from APIs.

Impact

We measure the accuracy of our pricing engine by measuring pricing offset per trip. This is defined as the percentage difference in the sourcing price versus predicted price. In the graph below, one can see the clear trendline of decreasing pricing offset per trip. There has been close to 10x improvement.

Certain peaks are visible intermittently. This is because of multiple factors or externalities like quarter end, flood in a city etc., which the pricing engine has to further improve upon.

Freight Pricing Engine has been able to bring in much-needed price transparency in the sector. This offering has led to an exponential increase in the number of trips, making RIVIGO Freight the leading trucking marketplace in India in a very short span of time. However, we believe this is just the beginning and we have a long way to go.

In the subsequent articles in this series, we will talk about:

  • Freight Rate Exchange – Approach and Architecture
  • Ramanujan System – Approach and Architecture
  • Application of Machine Learning in Pricing Engine

 We are continuously innovating and improving engine performance through machine learning as data liquidity continues to improve, to revolutionize the USD 100 Bn Indian trucking marketplace.