...
Non-technical Description
The Problem
When a placement is deleted in TIS, a message is sent to TSS to make a corresponding delete of its own record of that data. When deleted, the trainees profile would be updated to exclude that placement, and any placement onboarding notifications scheduled in the future would be removed.In some instances, TSS is failing to correctly delete the placement, or is deleting and immediately reinserting it, resulting in trainee profiles featuring, and notifications being sent for, these non-existent placementssyncing TIS data to TSS there are relationships between data that we have to be aware of, for example a Site or Post will have many Placements associated with it. Around 20,000 isn’t unusually for sites, and around 1000 for posts though there are some big outliers introduced for various workarounds, like a post with 64,000 placements.
...
When we receive updates to a site or post we need to update all related placements, but due to the large number of placements associated with some sites /posts it is challenging to process them all as part of the same update. Instead, we identify all placements that need to be updated and queue them to be processed one by one.
...
As shown, the related placements are identified by checking which placements we already have in the TSS database and this is where the problem potentially starts!
When a Placement is deleted, the associated Placement Specialty record is also deleted. To understand how the queuing becomes a problem lets focus on how that delete may look to TSS.
...
As we can see, something unexpected happens. The Placement Specialty deletion is received first and because we have no way of knowing the placement was already deleted, we grab the current data of the associated placement and re-queue it in order to trigger the trainee profile to be updated.
Next, we receive the placement deletion and we perform a few different actions
The placement is removed from the trainee profile
Any scheduled notifications associated with the placement are removed
Any required actions associated with the placement are removed
We remove the copy of placement from the TSS sync database
At this point, everything is as it should be. But, when the next message is received we treat it as an update to the placement, which triggers a few different actions
The placement is added to the trainee profile
Scheduled notifications associated with the placement are created
Required actions associated with the placement are created
We add a copy of the placement to the TSS sync database
We’ve just undone the deletion!
This same situation can also occur with update scenarios when multiple changes are made quickly, or related records are updated together. The original data, gets queued behind the updated data and we end up “reverting” the TIS update.
...
The Solution
The change required to fix this is a subtle but significant change to how we queue the original placement. We now queue a reference to the Placement (its ID) instead of using the original data.
...
When the reference is received we now get the latest version of the placement data.
In our delete scenario the placement is already gone, so after making one last request to TIS to send us any missing data we discard the request to re-process the placement. Leaving the deletion intact.
...
While there is no guarantee that this particular problem is the source of all data consistency problems, it does explain many of the synchronisation issues we’ve observed and had reported. Such as “orphaned” programme and placement data which requires manual intervention.
...
Trigger
Almost concurrent deletion events for placements and placement specialties are not always being handled in the correct sequence, resulting in the reinsertion of just-deleted placement records.
...
...
5 Whys (or other analysis of Root Cause)
...