Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Date

Authors

Reuben Roberts

Status

Documenting

Summary

The AWS monthly SMS limit was exceed, resulting in no codes being sent for SMS MFA or phone number verification

Impact

TSS users were unable to sign up/in for approximately 24 hours (226 users affected, with 866 attempts total)

Non-technical Description

SMS messages are sent for two actions

  • Verifying a phone number during SMS MFA setup

  • Signing in with SMS MFA

A monthly spend cap must be set on our account, this was previously raised to $300 per month. However, we recently changed our SMS configuration to send from eu-west-2 (London), instead of eu-west-1. The London region was misconfigured to have a $100 limit. We exceeded that limit and SMS messages could no longer be sent.

As a result the two actions noted above were not possible until the limit was raised, approximate impact:

  • 24 hours of SMS downtime

  • 226 users (based on phone number)

  • 866 total failed SMS messages


Trigger

  • SMS limit exceeded (due to change to SMS Region with default low limit in place)


Detection

  • Identified when TIS team member was unable to sign in to TSS


Resolution

  • Increase monthly SMS limit from $100 to $300 for eu-west-2


Timeline

GMT unless otherwise stated

  • 12:31 - SMS $100 limit exceeded, messages no longer being sent. No alarm was raised because the alarm was configured to only trigger when monthly spend reached 90% of $300, i.e. $270.

  • 11:59 - Issue noticed by TSS team

  • 12:08 - Manual switch to use eu-west-1 region for SMS while request raised with AWS Support to increase the limit on eu-west-2

  • 12:20 - Limit increased to $300 on eu-west-2

  • 12:21 - Manual revert to once again use eu-west-2 region for SMS


Root Cause(s)

  • The SMS costs exceeded the configured limits

    • The limit was not set appropriately

      • The eu-west-2 region had not had the same SMS spend limit applied as in eu-west-1

    • The alert for reaching 90% of the limit did not trigger

      • This would only trigger when we reached $270 spend, since it was based on an assumed limit of $300, not the $100 that actually applied

    • The switch from eu-west-1 to eu-west-2 for SMS was not thoroughly checked

      • SMS limit and alarm configuration was not included in the package of work


Action Items

Action Items

Owner

Terraform eu-west-2 SMS config and SMS limit alarm

https://github.com/Health-Education-England/TIS-OPS/pull/526


Lessons Learned

  • Terraform first

  • Test every change even if you think its identical to the previous

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.