2020-04-14 20,000 Post Specialities Missing

 

Date

Apr 14, 2020

Authors

@Philip Wilsdon (Unlicensed)

Status

In Progress

Summary

20,000 Post Specialities Missing

A load from a backup caused an additional issue

We used the most recent backup including Apr 13, 2020 and copied 60,000 PostSpecialty records from data as of Apr 9, 2020.



Impact

Users unable to modify data through Tuesday PM.

Data was not complete between Thursday PM and Tuesday evening.

 

 

Root Cause(s)

5 Whys

  1. We didn't fix a bug as we didn't realise the impact / severity

  2. We didn't prioritise / put it in a sprint

  3. Some specialty groups using the Front end interface

  4. Tech debt / bug with specialty groups (cascade type all) meant that the post specialities attached to specialities were updated to null

  5. Built with the wrong logic and testing (TDD/Unhappy path)

 

Trigger

Updating a number of specialties via. Front-end

 

Timeline

 

Where we got lucky

  • Got away with a bug for ages

Where we were unfortunate

  • No one updated the specialty and reported the error

  • So when we did in bulk - massive problem

  • User found the bug

Resolution

  • Restore the PostSpecialty data using the backup db data, and inform users to stop updating Specialty

  • Fix the PostSpecialty deletion bug as soon as possible, and run script to fix the rest data

PostSpecialty bug Analysis:

Reproduce the bug in local, and found on Admin/Speicalty page, no matter what field is updated, the related PostSpecialty data would be removed in the DB.

Went through the codes, noticed the Specialty entity has a OneToMany relationship to PostSpecialty table, but the cascade level was set to CacadeType.ALL, which means when the Specialty is updated/saved/deleted, the PostSpecialty would be updated on cascade. However, when the Specialty is updated on frontend, it won’t send any PostSpecialty to the backend, so the PostSpecialty set would be set to empty.

When we update Specialty, acually we don’t need to update anything in PostSpecialty, so the cascade level should be limited.

Detection

Reported on teams

 

Post Mortum Call

Initial investigation updated Apr 16, 2020

  • Specilty JAVA class - cascade type all - means when we update specilty the post specilty is updated as well

  • When saving the post, saves the post specialty to an “empty set”

  • If we update anything on the specility - posts speciality will be removed

  • Lots of work that needs doing for reference - access, moving and refactoring etc

  • Cascade remove and refresh so when we update the specilty page - do not need to update the post specility

  • How to test - integration test and ask James to do what he did again on Thursday morning

  • What other areas of TIS has this cascade ideas

  • Specility groups are valid and should be stored

  • API being called - need to provide parent and child entities

 

 

Action Items

Action Item

Type

Owner

Issue

Action Item

Type

Owner

Issue

Fix and Test

Code test and redo the same scenario on stage to make sure the same issue doesn't happen again

 

Bugfix


Vulgabee


https://hee-tis.atlassian.net/browse/TISNEW-3854

 

Search Codebase

Find anywhere in that might have cascade all.

Ticket up, Identify areas. Assess if critical and discuss actions

Prevent elsewhere

Vulgabee

https://hee-tis.atlassian.net/browse/TISNEW-4268

Move Specialty to Reference Service

 

Vulgabee

 

Restrict Access to Reference tables

 

Vulgabee

https://hee-tis.atlassian.net/browse/TISNEW-3887

Run Automated ‘e2e’ tests

Look at running automated e2e tests every night that reports any issues for the team to look at first thing in the morning.

Prevention

Vulgabee

 

Use a different partition for temporary backup files

Prevention

All Teams / DevOps

https://hee-tis.atlassian.net/browse/TISNEW-4251

Make database restores easier (Script it?)

Task

All Teams / DevOps

https://hee-tis.atlassian.net/browse/TISNEW-4266?searchSessionId=223caf6b-35e2-4947-b358-5fd28a4e9483&searchContainerId=10805&searchContentType=issue&searchObjectId=44956

Resolve issue with DB server config not being used

Reliability

All Teams / DevOps

https://hee-tis.atlassian.net/browse/TISNEW-4267?searchSessionId=9051ccf6-8d28-4fdf-b21b-2dd2d845c034&searchContainerId=10805&searchContentType=issue&searchObjectId=44957