Date |
|
Authors | |
Status | Documenting |
Summary | The AWS monthly SMS limit was exceed, resulting in no codes being sent for SMS MFA or phone number verification |
Impact | TSS users were unable to sign up/in for approximately 24 hours (226 users affected, with 866 attempts total) |
Non-technical Description
SMS messages are sent for two actions
Verifying a phone number during SMS MFA setup
Signing in with SMS MFA
A monthly spend cap must be set on our account, this was previously raised to $300 per month. However, we recently changed our SMS configuration to send from eu-west-2 (London), instead of eu-west-1. The London region was misconfigured to have a $100 limit. We exceeded that limit and SMS messages could no longer be sent.
As a result the two actions noted above were not possible until the limit was raised, approximate impact:
24 hours of SMS downtime
226 users (based on phone number)
866 total failed SMS messages
Trigger
SMS limit exceeded (due to change to SMS Region with default low limit in place)
Detection
Identified when TIS team member was unable to sign in to TSS
Resolution
Increase monthly SMS limit from $100 to $300 for eu-west-2
Timeline
GMT unless otherwise stated
12:31 - SMS $100 limit exceeded, messages no longer being sent. No alarm was raised because the alarm was configured to only trigger when monthly spend reached 90% of $300, i.e. $270.
11:59 - Issue noticed by TSS team
12:08 - Manual switch to use eu-west-1 region for SMS while request raised with AWS Support to increase the limit on eu-west-2
12:20 - Limit increased to $300 on eu-west-2
12:21 - Manual revert to once again use eu-west-2 region for SMS
Root Cause(s)
The SMS costs exceeded the configured limits
The limit was not set appropriately
The eu-west-2 region had not had the same SMS spend limit applied as in eu-west-1
The alert for reaching 90% of the limit did not trigger
This would only trigger when we reached $270 spend, since it was based on an assumed limit of $300, not the $100 that actually applied
The switch from eu-west-1 to eu-west-2 for SMS was not thoroughly checked
SMS limit and alarm configuration was not included in the package of work
Action Items
Action Items | Owner |
---|---|
Terraform eu-west-2 SMS config and SMS limit alarm | https://github.com/Health-Education-England/TIS-OPS/pull/526 |
Lessons Learned
Terraform first
Test every change even if you think its identical to the previous
Add Comment