Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Current »

Date

Authors

Reuben Roberts

Status

In process

Summary

TSS sometimes fails to delete placements correctly, resulting in obsolete notifications being sent to trainees.

Impact

Onboarding notifications were sent to trainees for placements to which they were no longer assigned. Deleted placements were retained in their TSS profiles.

Non-technical Description

The Problem

When syncing TIS data to TSS there are relationships between data that we have to be aware of, for example a Site or Post will have many Placements associated with it. Around 20,000 isn’t unusually for sites, and around 1000 for posts though there are some big outliers introduced for various workarounds, like a post with 64,000 placements.

image-20241119-160128.png

When we receive updates to a site or post we need to update all related placements, but due to the large number of placements associated with some sites /posts it is challenging to process them all as part of the same update. Instead, we identify all placements that need to be updated and queue them to be processed one by one.

image-20241119-162804.png

As shown, the related placements are identified by checking which placements we already have in the TSS database and this is where the problem potentially starts!

When a Placement is deleted, the associated Placement Specialty record is also deleted. To understand how the queuing becomes a problem lets focus on how that delete may look to TSS.

image-20241119-164702.png

As we can see, something unexpected happens. The Placement Specialty deletion is received first and because we have no way of knowing the placement was already deleted, we grab the current data of the associated placement and re-queue it in order to trigger the trainee profile to be updated.

Next, we receive the placement deletion and we perform a few different actions

  1. The placement is removed from the trainee profile

  2. Any scheduled notifications associated with the placement are removed

  3. Any required actions associated with the placement are removed

  4. We remove the copy of placement from the TSS sync database

At this point, everything is as it should be. But, when the next message is received we treat it as an update to the placement, which triggers a few different actions

  1. The placement is added to the trainee profile

  2. Scheduled notifications associated with the placement are created

  3. Required actions associated with the placement are created

  4. We add a copy of the placement to the TSS sync database

We’ve just undone the deletion!
This same situation can also occur with update scenarios when multiple changes are made quickly, or related records are updated together. The original data, gets queued behind the updated data and we end up “reverting” the TIS update.

The Solution

The change required to fix this is a subtle but significant change to how we queue the original placement. We now queue a reference to the Placement (its ID) instead of using the original data.

image-20241119-170938.png

When the reference is received we now get the latest version of the placement data.

In our delete scenario the placement is already gone, so after making one last request to TIS to send us any missing data we discard the request to re-process the placement. Leaving the deletion intact.

image-20241119-172436.png

While there is no guarantee that this particular problem is the source of all data consistency problems, it does explain many of the synchronisation issues we’ve observed and had reported. Such as “orphaned” programme and placement data which requires manual intervention.


Trigger

Almost concurrent deletion events for placements and placement specialties are not always being handled in the correct sequence, resulting in the reinsertion of just-deleted placement records.


Detection

Issue reported by LOs 13 Nov 2024.

image-20241114-143103.png


5 Whys (or other analysis of Root Cause)

  • The example given is for placement 2568353 for trainee 319330. We sent a ‘zombie’ 12-week placement notification for this:
    2024-11-13T01:24:45.607Z INFO 1 --- [eduler_Worker-1] u.n.t.t.n.service.NotificationService : Sent DEFAULT.PLACEMENT_UPDATED_WEEK_12-2568353 notification for 2568353 (In Post, starting 2025-02-05) to mubarakdanyaro@gmail.com using template v1.0.0

  • We still have that placement in our sync MongoDb, though it no longer exists in TIS.

  • We received a delete instruction in tis-trainee-sync for the placement specialty and then the placement at roughly2024-10-25T09:31:11.3 . After the placement specialty delete, the service tried to resync the placement, apparently retrieving it at 2024-10-25T09:31:11.369Z before it was deleted from the db/cache, and then reinserting it.

    The key tis-trainee-sync log lines show this (full logs attached):

    2024-10-25T09:31:11.368Z DEBUG 1 --- [ontainer#2-1234] .t.t.s.e.PlacementSpecialtyEventListener : After placement specialty delete, search for placement 2568353 to re-sync.
    
    2024-10-25T09:31:11.369Z DEBUG 1 --- [ontainer#2-1234] .t.t.s.e.PlacementSpecialtyEventListener : Placement Record(tisId=2568353, data={placementAddedDate=2024-05-24T08:50:40Z, lifecycleState=APPROVED, gradeId=306, siteCode=J81052, postId=233420, dateFrom=2025-02-05, placementWholeTimeEquivalent=1, traineeId=319330, placementType=In Post, dateTo=2025-08-05, siteId=16489, id=2568353}, metadata={}, type=null, operation=null, schema=tcs, table=Placement) found, queuing for re-sync.

    Code lines from: https://github.com/Health-Education-England/tis-trainee-sync/blob/958dd07fb543a2a9c304b4c93a039a399e582f4c/src/main/java/uk/nhs/hee/tis/trainee/sync/event/PlacementSpecialtyEventListener.java#L123

    Even though the subsequent request for the placement specialty from TIS will not succeed, this does not affect the ‘enrichment’ status of the re-inserted placement record, which will then be synced to the details service. We are unable to verify at this point that the request ended up in the corresponding DLQ (tis-trainee-sync-prod-request-interim-dlq.fifo) as the retention period is only 4 days.

  • The deleted and resurrected placement seems to have been deleted and then reinserted into the trainee profile as expected. Logs as follows:

    2024-10-25T09:31:11.444Z  INFO 1 --- [io-8203-exec-79] u.n.h.t.details.api.PlacementResource    : Delete placement with TIS ID 2568353 of trainee with TIS ID 319330
    
    2024-10-25T09:31:11.466Z  INFO 1 --- [io-8203-exec-95] u.n.h.t.details.api.PlacementResource    : Update placement of trainee with TIS ID 319330 with PlacementDto PlacementDto(tisId=2568353, startDate=2025-02-05, endDate=2025-08-05, site=SiteDto(name=LIFEBOAT QUAY MEDICAL CENTRE, knownAs=LIFEBOAT QUAY MEDICAL CENTRE (J81052), location=UNIT A, LIFEBOAT QUAY, POOLE ), otherSites=[], grade=ST1, specialty=null, subSpecialty=null, postAllowsSubspecialty=false, otherSpecialties=[], placementType=In Post, employingBody=University Hospitals Dorset NHS Foundation Trust, trainingBody=NHS DORSET ICB - 11J, wholeTimeEquivalent=1, status=null, signature=null)

  • The first delete instruction was also actioned by the notification service:

    2024-10-25T09:31:11.529Z  INFO 1 --- [ntContainer#6-6] u.n.t.t.n.event.PlacementListener        : Handling placement delete event PlacementEvent[tisId=2568353, recrd=RecordDto(data={placementAddedDate=2024-05-24T08:50:40Z, lifecycleState=APPROVED, gradeId=306, siteCode=J81052, postId=233420, dateFrom=2025-02-05, placementWholeTimeEquivalent=1, traineeId=319330, placementType=In Post, dateTo=2025-08-05, siteId=16489, id=2568353}, metadata={})].
    
    2024-10-25T09:31:11.577Z  INFO 1 --- [ntContainer#6-6] u.n.t.t.n.service.NotificationService    : Removed any stale notification scheduled for PLACEMENT_UPDATED_WEEK_12-2568353
    
    2024-10-25T09:31:11.586Z  INFO 1 --- [ntContainer#6-6] u.n.t.t.n.service.NotificationService    : Removed any stale notification scheduled for PLACEMENT_ROLLOUT_2024_CORRECTION-2568353

    and then the reinserted record was in turn rescheduled a minute later, leading to the incorrect notification mentioned above:

    2024-10-25T09:32:11.509Z  INFO 1 --- [Container#5-216] u.n.t.t.n.event.PlacementListener        : Handling placement update event PlacementEvent[tisId=2568353, recrd=RecordDto(data={placementAddedDate=2024-05-24T08:50:40Z, lifecycleState=APPROVED, gradeId=306, siteCode=J81052, postId=233420, dateFrom=2025-02-05, placementWholeTimeEquivalent=1, traineeId=319330, placementType=In Post, dateTo=2025-08-05, siteId=16489, id=2568353, employingBodyName=University Hospitals Dorset NHS Foundation Trust, trainingBodyName=NHS DORSET ICB - 11J, owner=Wessex, postAllowsSubspecialty=false, site=LIFEBOAT QUAY MEDICAL CENTRE, siteLocation=UNIT A, LIFEBOAT QUAY, POOLE , siteKnownAs=LIFEBOAT QUAY MEDICAL CENTRE (J81052), otherSites=[], gradeAbbreviation=ST1, otherSpecialties=[]}, metadata={})].
    
    2024-10-25T09:32:11.569Z  INFO 1 --- [Container#5-216] u.n.t.t.n.service.NotificationService    : Notification for PLACEMENT_UPDATED_WEEK_12-2568353 scheduled for Wed Nov 13 00:00:00 UTC 2024
    

Stakeholder Demo


Resolution


Timeline

All times GMT unless otherwise indicated.

  • 15:05 Incorrect notifications reported by LOs. In addition, there has been a steady stream of requests for manual profile re-syncs which may well reflect these incorrect placements retained in TSS.

  • 15:53 Issue reported on trainee-team-general Slack channel.

  • 11:30 Initial investigation into specific example identifies potential issue.


Action Items

Action Items

Owner

See also:


Lessons Learned

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.