2020-04-16 Users not able to login - too many DB connections
Date | Apr 16, 2020 |
Authors | @Simon Meredith (Unlicensed) |
Status | Complete |
Summary | TIS unavailable for all users. MySQL DB giving “too many connections” error. Storage also almost full. |
Impact | Users unable to log in until 8.40am |
Root Cause(s)
Previous fire-fire issue 2020-04-14 20,000 Post Specialities Missing required TCS database backup to be restored to production.
A restart meant the MAX_CONNECTIONS property was reset and was therefore too low.
Restoration of the TCS DB also meant that the binlog had grown meaning the disk had run out of space
Trigger
Reported on teams at 7:30
Resolution
Restarted MySQL
To prevent immediate recurrance
Removed tmp file
Purged MySQL binlog
Detection
Users reported on Teams
Monitoring channel reported out of space
Action Items
Action Item | Type | Owner | Issue |
---|---|---|---|
| Reliability |
| Review the config |
Add loads of space / locations of backup | Prevention | All Teams / DevOps | Backup location changed as part of https://hee-tis.atlassian.net/browse/TISNEW-4251 |
Monitoring / Alert disk space | Prevention | All Teams / DevOps | Timeout of the disk space alert so that we don’t assume it's just the backups running and will resolve itself |
Timeline
Reported on teams at 07:30 AM
Resolved at 08:44 AM - message to users on teams and users confirming they can log in
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213