Date |
|
Authors | |
Status | Documenting |
Summary | DMS tasks have been down for a couple of months (since mid October) and were brought back up only today |
Impact | No data changes synchronized over to TSS in those two months |
Non-technical Description
DMS was not running on either prod and preprod. Changes in the TIS databases meant to be captured and synchronized to TSS must not have been captured while DMS was down.
...
Trigger
Mysql server not being updated to anticipate DMS’s change of address
...
Detection
Andy Dingley noticed the tasks were not running on both preprod and prod
...
Resolution
TODO: .
...
Whitelisting of DMS’s new address on the Mysql server
...
Timeline
(approximate) - DMS tasks stop working
14:10 GMT - Andy finds the the DMS tasks are not running
15:30 GMT - Tasks restarted successfully after the whitelisting of DMS addresses
14:20 GMT - Ticket opened with AWS Support to get information on why the DMS addresses were changed
18:50 GMT - Response from AWS Support
...
Root Cause(s)
DMS’s address change
Why did the DMS address change (& when might it happen again?)
Does this mean it’s a firewall thing? MySQL user too? Is there a more dynamic way that we can set this?
Use AWS Secrets Manager instead?
AWS Support mentioned that a host replacement occurred on both preprod/prod Replication Instances on - AWS Support was unable to get access to the records related to our DMS service issues back in October as the process logs are only kept for a limited time however there is a good chance that a host replacement also occurred in October. The public IPs would have been changed when the hosts were replaced.
...
Action Items
Action Items | Owner |
---|---|
Add monitoring to DMS | |
Mitigate to prevent this from happening in the future | |
...