Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Ongoing

Date

Authors

Joseph (Pepe) Kelly Cai Willis

Status

Resolved

Summary

Revalidation production login delay and logout redirecting to stage and localhost

Impact

An admin user was having issues to submit recommendation

...

Our applications live in the cloud - this means that another company (in this case Amazon Web Services) manages the physical machines that are running our applications and provides us with features that make our applications more robust and failure tolerant. In this case the relevant feature is “Availability Zones” - Our application lives in multiple data centres simultaneously - this means that if one of these data centres (and sometimes each Availability Zone is itself a group of data centres!) suffers even a catastrophic failure (e.g. fire, flooding or power loss) then the other instances of our application will still be running fine! AWS then manages how traffic is split between these zones so that this duplication is invisible to the user.

Our applications are also split into “MicroServices” - this means instead of a single program doing everything, we split up functionality into lots of smaller applications which talk to each other to fulfil some wider purpose - which are easier to maintain and provide some fault tolerance (e.g. if one stops working it doesn’t necessarily mean the whole of TIS stops working!)

The root cause of this issue is that some failure in a deployment meant that one of these Availability Zones was missing one of our applicationsmicroservices, so when a user tried to login and their request was routed to this Availability Zone, the process would fail!

Fixing this was as simple as redeploying the affected application (a zero downtime operation).

...