Date |
|
Authors | |
Status | Resolved |
Summary | Database ran out of space and resulted in system failure. |
Impact | Users were unable to log into TIS for approx 20 mins |
https://hee-tis.atlassian.net/browse/TISNEW-5190
Root Cause(s)
Database ran out space
Slow logs seemed to take a disproportionate amount of space
Trigger
BAU? Not clear anything inparticular caused a jump in usage?
Resolution
Deleted some log files to clear space
Detection
User reported at 12
Uptime robot, once we took key cloak down
Timeline
12:00 Users reported being unable to access TIS
12:07 fire fire call started
12:10ish restarted keycloak
12:15ish Sachin spots SQL DB is full
12:20 ish stuff is removed from database and it starts working again
Action Items
Action Items | Owner |
---|---|
Fix monitoring: Alertmanager should send to #monitoring-prod rather than #monitoring Uptime robot didn’t report outage until keycloak was unavailable Error messages need to be clear | |
Look at disk management | |
Decide on bigger disk? |
Add Comment