Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Status

Working on it

Summary

Impact

...

-09:48 AM

- 10:25 AM

Created ticket and incident page https://hee-tis.atlassian.net/browse/TISNEW-5728

2020-11-18 Reval Legacy/Old GMC Sync

Root Causes

  • Accidental major version update to one of our core infrastructure tools caused a failure in a dependent tool. This was ok in itself with the containers that were running, but no new containers could launch, ie ETL’s or newly deployed software versions.

...

  • NDW ETL (Prod) failure alert on Slack

  • Reval / GMC sync ETL’s failure alert on Slack

Actions

  • [insert actions to take to mitigate this happening in future]

  • e.g.

  • keep everything more up to date to avoid major impacts of upgrades in future

  • ensure one person is not a single point of failure - required code reviews for infrastructure changes

  • specific changes to the architecture (list them) to improve resilience:

    • Use of ‘serverless’ technology: ECS, RDS, DocumentDB

  • check STAGE matches PROD upgrade

...