Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

16

Authors Reuben Roberts

Cai Willis Steven Howard

Status

Done

SummaryThe notification service for Form R updates was sending duplicate emails to some doctors.

ImpactOver the course of 3 days, some doctors would have received a number of duplicate confirmation emails over a short period of time (~15min) when submitting their FormRs.

Table of Contents

Non-technical Description

When a FormR is submitted, an email is triggered notifying the PGDiT that their form has been received. Similar emails are sent if the LO un-submits or deletes the FormR. The queue that handles the list of messages that need to be sent was misconfigured, meaning that in some instances the same message was processed a number of times, resulting in duplicate emails to the PGDiT.

...

Trigger

...

Detection

  • A survey response complained of duplicate messages.

  • At most 122 doctors who received more than one email were affected (some legitimately submit both a FormR Part A and a Part B within a fairly short period of time, triggering two confirmations). 25 doctors were more definitely affected, having received 3 or more emails. The largest number of emails received by an individual doctor was 10, which occurred to two PGDiTs.

...

Resolution

  • Queue message visibility window was increased.


...

Timeline

All times in BST unless indicated

  • 16 : 15: 02 - Notifications service functionality added to respond to FormR updates.

  • : 18:11 - Survey response mentions duplicate emails.

  • : 19:30 - Survey response noted by TSS team.

  • : 08:45 - Fix applied to affected queues (form-updated.fifo and credential-revoked.fifo)

Root Cause(s)

...

...

Form update messages were being processed more than once.

...

The message visibility window on the form-updated queue was set to the default of 30 seconds, but the notification service takes longer than that to periodically cache user account details.

...

Messages were being requeued when their visibility window expired, meaning that they would be reprocessed.

Action Items

Action Items

Owner

...

Lessons Learned

...

This issue only manifested on production, because of the much larger user pool in that environment which affected the caching time.

...

Because cached users would not require the cache to be rebuilt, they would not experience the duplication, so it was not noticed in testing.

...