Trainee Profile database indexing baseline tests

This template is brought to you by Optimizely, a leading experimentation platform.

Experiment plan and results	Trainee Self Service performance: with and without database indexing
Experiment owner	Reuben Roberts
Reviewers	Andy Nash (Unlicensed)
Approver
Optimizely link
Jira ticket(s)	https://hee-tis.atlassian.net/browse/TIS21-1274
Status	IN REVIEW / IN PROGRESS / COMPLETE
On this page

Stakeholder summary

Adding an index to the Trainee Details database improved overall speed to retrieve a trainee profile by a factor of two.

📋 Experiment planning

Overview

This was a baselining experiment to determine the performance impact of database indexing for the Trainee Details component of TIS Self-Service.

Improved performance would mean that trainee users would be able to see their data more quickly, as this would reduce the load time of the Trainee Details interface.
Improved performance would increase the rate at which changes from other parts of the TIS system can be synchronised with the Trainee Self-Service component, reducing the possibility of confusion resulting from contradictory trainee details appearing in different parts of the TIS system.

Hypothesis

We hypothesize that adding a database index

will decrease the time taken to retrieve a Trainee Profile from the database

Metrics

Time to retrieve a complete Trainee Profile using the Trainee Details APIPRIMARY METRIC
Database-level processing time for the above SECONDARY METRIC

Targeting

This experiment would impact

Trainee sync (tis-trainee-sync)
Trainee User Interface (tis-trainee-ui)

Environment

Profiling was conducted on a Dell running Windows 10 and equipped with an Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz and 32.0GB RAM. No specific standardisation was applied to system resources available to the software, since the profiling was intended to be comparative (i.e. A/B).

The MongoDB database server was running locally, with 79 602 TraineeProfile records, equivalent in number to the Trainee Self-Service staging database at the point the tests were run.

Test design

Testing was conducted with the database state modified:

without the traineeTisId field indexed (test A) and
with a regular ascending index applied to the traineeTisId field (test B).

200 trainee TIS Ids were selected at random from the complete set, and the Trainee Details API was called iteratively to retrieve trainee details for each trainee TIS Id. (Result 1, below)

CPU profiling using Visual VM was used to identify wait time, to identify when the application was waiting for a response from the database. (Result 2, below)

MongoDB Query Performance was assessed using the 'Explain Plan' output in MongoDB Compass for a query to retrieve a single TraineeProfile document using traineeTisId as the filter. (Result 3, below)

Notes

It is necessary to allow a newly created database index to stabilise for a minute or two before running tests, otherwise performance metrics will be misleading.

📊 Results

Test A: (Without index)

Retrieving 200 random trainee profiles via the API: 87.4s (+/- 4.8s) (n=5)
getTraineeProfileByTraineeTisId waiting time: 985ms
Query fetch execution time: 5ms (documents examined: 79602)

Test B: (With index)

Retrieving 200 random trainee profiles via the API: 38.6s (+/- 6.3s) (n=5)
getTraineeProfileByTraineeTisId waiting time: 75ms
Query fetch execution time: 0ms (documents examined: 1)

	A: Control	B: Variation	Change
Retrieving profiles via API	87.4s	38.6s	-56%
getTraineeProfileByTraineeTisId waiting time	985ms	75ms	-92%
Query fetch execution time	5ms	0ms	-100%

✨ Conclusions

Highlights

Overall speed to retrieve a trainee profile using the API using the indexed database was over twice that of the unindexed database.
Database-level metrics of performance were even better than this: the difference reflects the fact that the API performs other work apart from database access. This work is unaffected by the indexing improvements.

Takeaways

Given the low amount of effort required, adding an index to the database is a cost-effective way of improving performance.

Follow-up

Roll-out database index to staging and production environments