Date |
|
Authors | |
Status | Documenting |
Summary | We received loads of sentry errors |
Impact |
Table of Contents |
---|
Non-technical Description
We have a {Lambda} job running up to once per minute that follows changes in the revalidation database and it . It keeps track of what the most recent change processed was . The in order to know where it should pick up next time. We received an alert that the reference to that latest change couldn’t be used. ???Because…???The reference to the latest change was no longer available
We moved the old reference and made a change to the code to enable the {Lambda} job to restart tracking changes.it
Since Between Saturday morning until and Monday morning the some changes in the Reval Database could would not be have been reflected on Reval UI ., however as there were no changes during this period
Trigger
There were a lot of Sentry errors at about 2,800 unclear whether anything was missed?
...
Detection
...
Resolution
...
Timeline
All times in GMT unless indicated
18:07 - Last recommendation submitted
05:00-05:29 - Backup window
05:29 - Earliest Slack message identifying an issue with the production CDC Lambda
~06:00 - renamed attribute in the database collection. This caused other issues which were resolved by applying a hotfix to generate a new reference.
-
-
Root Cause(s)
...
Action Items
Action Items | Owner | |
---|---|---|
Investigating why the position in capped collection was deleted (see errors on weekend):
| ||
Increase the change stream retention period. | ||
Make the resume token check work if there isn’t one…
|
...