Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Status

Documenting

Summary

DMS tasks have been down for a couple of months (since mid October) and were brought back up only today

Impact

No data changes synchronized over to TSS in those two months

Non-technical Description

  • DMS was not running on either prod and preprod. Changes in the TIS databases meant to be captured and synchronized to TSS must not have been captured while DMS was down.

...

Trigger

  • Mysql server not being updated to anticipate DMS’s change of address

...

Detection

  • Andy Dingley noticed the tasks were not running on both preprod and prod

...

Resolution

TODO: .

...

  • Whitelisting of DMS’s new address on the Mysql server

...

Timeline

  • (approximate) - DMS tasks stop working

  • 14:10 GMT - Andy finds the the DMS tasks are not running

  • 15:30 GMT - Tasks restarted successfully after the whitelisting of DMS addresses

  • 14:20 GMT - Ticket opened with AWS Support to get information on why the DMS addresses were changed

  • 18:50 GMT - Response from AWS Support

...

Root Cause(s)

  • DMS’s address change

    • Why did the DMS address change (& when might it happen again?)

  • Does this mean it’s a firewall thing? MySQL user too? Is there a more dynamic way that we can set this?

    • Use AWS Secrets Manager instead?

  • AWS Support mentioned that a host replacement occurred on both preprod/prod Replication Instances on - AWS Support was unable to get access to the records related to our DMS service issues back in October as the process logs are only kept for a limited time however there is a good chance that a host replacement also occurred in October. The public IPs would have been changed when the hosts were replaced.

...

Action Items

Action Items

Owner

Add monitoring to DMS

https://hee-tis.atlassian.net/browse/TIS21-2443

Mitigate to prevent this from happening in the future

https://hee-tis.atlassian.net/browse/TIS21-2515

...

Lessons Learned