How Did I Make a Report 150x Faster

Working at Governo do Estado de Roraima, I frequently need to work with reports, whether to create a dashboard or to modify how data is processed/presented.

Given this context, there was this specific report that took WAY TOO LONG to be generated, and it was driving me insane. It had several issues:

  • The queries cross-referenced a ton of data, distributed through tables with millions of lines in the database
  • It could only be generated monthly, instead of at once, for my desired period, which was 25 months
  • Only some of the data that it returned was actually relevant to the dashboard I was implementing

This report, for the whole 25 months, took 15 minutes. You may think this is not that much, but this work was done every time someone accessed the dashboard, and until the data was actually loaded, the user spent 15 minutes looking at a loading window.

I found this unviable, so I took steps to make it more efficient.

1. Lower the Number of Columns Queried From the Database

Select less, query faster 🤯

Usually, since a much larger government system, with many more use cases, uses this SQL query, it is MASSIVE. However, I don't need most of the data it queried, and overselecting really slowed the requests down. So, my first step was to remove all unnecessary columns from the query.

Before:

SELECT A,
       UNNECESSARY_COLUMN_B,
       UNNECESSARY_COLUMN_C,
       ...
FROM TABLE_A,
     UNECESSARY_TABLE_B,
     SLOW_SUBQUERY_C,
     ...

After:

SELECT A,
       ...
FROM TABLE_A,
     SLOW_SUBQUERY_C,
     ...

This simple change was crucial regarding performance. The column count went from 30+ to 2, making the query go from 15 minutes to 2 minutes.

Lowering the number of columns resulted in an 85% performance gain.

2. Run All Queries at Once

Select multiple times at once

Even after lowering the report query time to 2 minutes, it was still too slow. This happened because the queries for the 25 months were executed SEQUENTIALLY. This raises the question: why not query them all at once?

The solution was relatively simple: use multiple available server threads to query all 25 months CONCURRENTLY, and then aggregate the results. This way, the report time went from 2 minutes to 30 seconds.

This meant a 75% performance gain.

3. Caching Results

Why query every time?

Looking for patterns in the reports to increase efficiency, I've noted that the data from queries older than 2 months rarely changes.

This meant I could cache any results that were at least 2 months old. The outcome was blazingly fast reports, with data gathered in under 1 second. This made the report time go from 30 seconds to 6 seconds.

This meant an 80% performance gain.

Last Thoughts

With far faster reports

A 15-minute-to-6-second performance gain is a whopping 150x. This may not be the best user experience, but it still beats waiting for 15 minutes while cursing the system.

I've gathered several insights from developing this dashboard, mostly performance-related. The goal now is to make newer reports faster and to refactor the queries, which remain vast and undocumented.

Wish me luck 😁.

PS: Messing around with the SQL query, I may find a way to gather the whole 25 months at once, which means no need for concurrent queries and faster results, I hope.