Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

Date

Authors

John Simmons (Deactivated) Joseph (Pepe) Kelly

Status

Done

Summary

TIS21-4563 - Getting issue details... STATUS

A configuration defect meant that when a machine was restarted, the address for the Northern Ireland no longer went to TIS.

Impact

Northern Ireland users were unable to access TIS until the configuration was corrected.

Non-technical Description

NIMDTA TIS became unavailable when a downgrade of the server size to save costs happened. Although the new smaller service started and became responsive to our internal testing, the external access failed.


Trigger

  • Resizing the NIMDTA apps and database servers to save money


Detection

  • User report (Slack)


Resolution

  • Added new servers' public IP address to DNS to enable the service to be used as quickly as possible

  • Added a permanent IP address for the NIMDTA web server so that any further stopping and starting of the server will result in the same IP address being used each time.


Timeline

BST unless otherwise stated

  • - Machines resized and restarted, once they became available an SSH login was performed and access to both servers was there. (this procedure only tested the private addresses not the public address)

  • - 9:52 am Mark Oliver messages on Slack to say there is no connection to TIS

  • - 10.05 am Problem identified and correction started

  • - 10.15 am Correction applied

  • - 10.20 am DNS changes took effect after 600 second window.

  • - 10.21 am Service restored and Mark was asked to test connections

  • - 11.16 am Mark Oliver confirms all is working as expected

  • - 11.45 am Elastic IP address assigned to Nimdta apps server, and DNS updated to stop this happening again.


Root Cause(s)

  • Trying to reach Admins UI resulted took too long and resulted in an error.

  • The website address referred to an IP address which was not reachable.

  • We should also have had an alert from UptimeRobot to say that the NIMDTA service was not available. This would have alerted us to the problem before the end users found it, but unbeknownst to us all of our external monitoring has been removed from UptimeRobot without telling us.

  • The web server did not have the IP address assigned to it.

  • When the server had been been initially built an elastic IP address had not been assigned to the server. Therefore reboots would probably have kept the original public IP address but a full stop, then start of the service would defiantly have resulted in a new public IP address being assigned to the VM.


Action Items

Action Items

Comments

Owner

Add elastic IP Address Creation/Assignment to Terraform config for servers that need public IP addresses.

John Simmons (Deactivated)

Add external monitoring of public facing websites

John Simmons (Deactivated)

Lessons Learned

  • Do not just check the private IP address to see if a server is back up from a restart as that only checks the private IP address

  • No labels