Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

John Simmons (Deactivated) Joseph (Pepe) Kelly

Status

Done

Summary

Jira Legacy
serverSystem JIRA
serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
keyTIS21-4563

A configuration defect meant that when a machine was restarted, the address for the Northern Ireland no longer went to TIS.

Impact

Northern Ireland users were unable to access TIS until the configuration was corrected.

Non-technical Description

NIMDTA TIS became unavailable when a downgrade of the server size to save costs happened. Although the new smaller service started and became responsive to our internal testing, the external access failed.


Trigger

  • Resizing the NIMDTA apps and database servers to save money


Detection

  • User report (Slack)


Resolution

  • Added new servers' public IP address to DNS to enable the service to be used as quickly as possible

  • Added a permanent IP address for the NIMDTA web server so that any further stopping and starting of the server will result in the same IP address being used each time.


Timeline

BST unless otherwise stated

  • - Machines resized and restarted, once they became available an SSH login was performed and access to both servers was there. (this procedure only tested the private addresses not the public address)

  • - 9:52 am Mark Oliver messages on Slack to say there is no connection to TIS

  • - 10.05 am Problem identified and correction started

  • - 10.15 am Correction applied

  • - 10.20 am DNS changes took effect after 600 second window.

  • - 10.21 am Service restored and Mark was asked to test connections

  • - 11.16 am Mark Oliver confirms all is working as expected

  • - 11.45 am Elastic IP address assigned to Nimdta apps server, and DNS updated to stop this happening again.


Root Cause(s)

  • Trying to reach Admins UI resulted took too long and resulted in an error.

  • The website address referred to an IP address which was not reachable.

  • We should also have had an alert from UptimeRobot to say that the NIMDTA service was not available. This would have alerted us to the problem before the end users found it, but unbeknownst to us all of our external monitoring has been removed from UptimeRobot without telling us.

  • The web server did not have the IP address assigned to it.

  • When the server had been been initially built an elastic IP address had not been assigned to the server. Therefore reboots would probably have kept the original public IP address but a full stop, then start of the service would defiantly have resulted in a new public IP address being assigned to the VM.


Action Items

Action Items

Comments

Owner

Add elastic IP Address Creation/Assignment to Terraform config for servers that need public IP addresses.

John Simmons (Deactivated)

Add external monitoring of public facing websites

John Simmons (Deactivated)

Lessons Learned

  • Do not just check the private IP address to see if a server is back up from a restart as that only checks the private IP address