2021-12-09 DMS tasks not been running

Date

Dec 9, 2021

Authors

 

Status

Documenting

Summary

DMS tasks have been down for a couple of months (since mid October) and were brought back up only today

Impact

No data changes synchronized over to TSS in those two months

Non-technical Description

  • DMS was not running on either prod and preprod. Changes in the TIS databases meant to be captured and synchronized to TSS must not have been captured while DMS was down.


Trigger

  • Mysql server not being updated to anticipate DMS’s change of address


Detection

  • @Andy Dingley noticed the tasks were not running on both preprod and prod


Resolution

  • Whitelisting of DMS’s new address on the Mysql server


Timeline

  • Oct 15, 2021 (approximate) - DMS tasks stop working

  • Dec 9, 2021 14:10 GMT - Andy finds the the DMS tasks are not running

  • Dec 9, 2021 15:30 GMT - Tasks restarted successfully after the whitelisting of DMS addresses

  • Dec 23, 2021 14:20 GMT - Ticket opened with AWS Support to get information on why the DMS addresses were changed

  • Dec 23, 2021 18:50 GMT - Response from AWS Support


Root Cause(s)

  • DMS’s address change

    • Why did the DMS address change (& when might it happen again?)

  • Does this mean it’s a firewall thing? MySQL user too? Is there a more dynamic way that we can set this?

    • Use AWS Secrets Manager instead?

  • AWS Support mentioned that a host replacement occurred on both preprod/prod Replication Instances on Dec 13, 2021 - AWS Support was unable to get access to the records related to our DMS service issues back in October as the process logs are only kept for a limited time however there is a good chance that a host replacement also occurred in October. The public IPs would have been changed when the hosts were replaced.


Action Items

Action Items

Owner

Action Items

Owner

Add monitoring to DMS

https://hee-tis.atlassian.net/browse/TIS21-2443

Mitigate to prevent this from happening in the future

https://hee-tis.atlassian.net/browse/TIS21-2515

 

 

 

 


Lessons Learned

  •