Date |
| ||||||||
Authors | |||||||||
Status | Documenting | ||||||||
Summary |
| ||||||||
Impact | No user impact determined. |
Table of Contents |
---|
Non-technical Description
We have a job running up to once per minute that follows changes in the revalidation database. It keeps track of what the most recent change processed was in order to know where it should pick up next time. We received an alert that the reference to that latest change couldn’t be used. The reference to the latest recommendation change was no longer available. We were fortunate that changes to doctors continue.
We moved the old reference and made a change to the code to enable the job to restart tracking changes.
...
There were a lot of Sentry errors at about 2,800 unclear whether anything was missed?. It doesn’t look like anything recommendations were made and so we were fortunate nothing was affected.
Combination of 3 hour window having elapsed and transaction log rolling over
...
Detection
...
Resolution
...
Timeline
...
18:07 - Last recommendation submitted.
05:00-05:29 - Backup window
05:00-05:29 - Backup window
05:29 - Earliest Slack message identifying an issue with the production CDC Lambda
~06:00 - renamed attribute in the database collection. This caused other issues which were resolved by applying a hotfix to generate a new reference .until a more robust solution is implemented
Root Cause(s)
Sentry errors were caused by a reference which could not be found
The reference was to the change stream for the last change in the recommendation collection, which was presumably cleaned up
The change stream was cleaned up*
The change stream is configured to hold references for at least to 3 hours (default)
...
Action Items
Action Items | Owner | |
---|---|---|
Investigating why the position in capped collection was deleted (see errors on weekend):
| ||
Increase the change stream retention period. | ||
| Not right now. We can review this if the database is upgraded as expected. |
...