Mongo Read Optimisation: Write Concern

Sree Venkat / 2023-07-14

Why Read Optimisation?

In order to provide good user experience, one of the key concerns is to ensure that the application is fast. Other than optimisations that can be done at an application level, database optimisations are the bread and butter of almost all applications, especially when they are user facing. Now based on the nature of your application and product you might choose to optimise for either reads or write or both of them.

There are also some metrics from an infra perspective that need to be looked out for in order to ensure that the database instance has enough capacity to handle requests. These could be metrics like number of connections, cpu utilisation, disk utilisation and memory utilisation.

What is write concern?

This is a configuration which is used to decide, what conditions are to be met for a write operation(insert/update) to be successful.

Scenario:

Let us take the scenario where you’re building a platform that allows theatres to distribute tickets to their shows. Such an application will need to perform some aggregate queries in order allow its users to find shows across theatres and various shows in a given theatre.

This platform is using MongoDB on Atlas as a dedicated cluster with a 3 node setup consisting of 1 Primary and 2 Secondary nodes (P-S-S).

The Optimisation

Now with growing data size and user traffic both queries and database usage will need to optimised. After optimising the queries it was evident from an infrastructure perspective that it would be more efficient to move all the reads to the secondary node of our MongoDB cluster.

Unknowingly both aggregate and some primary key based queries were modified to read from the secondary node using the readPreference configuration.

There were some side-effects to the booking flow for analytics purposes which were querying to identify booking record by id. This resulted in some data being recorded in the analytics tool which was not accurate from the actual bookings that had taken place

When we noticed the anomaly and went debugging we could see that the records in the database where reflecting the correct terminal state.

So how did this happen?. . .

The database connection had the default implicit write concern which is “majority”. Now what that means is that in a 3 node cluster, if 2 nodes(1P and 1S) acknowledge the write request then, the operation is considered successful.

When read queries were moved to the secondary it could have been any of the 2 secondary nodes. The anomaly was the case where the read query from the side-effect went to the non-major secondary node which had not yet received the changes to be replicated due to a slightly increased replication lag at the time.

Mitigation

Ensuring that all the non-aggregate queries are reading from the primary assured that the data would be consistent. The write concern however was not changed from majority since that would slow down writes and some replication lag seemed to be affordable.