Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • 09:33am A deployment was triggered. 2/3 tasks reported “unhealthy” in load balancer monitoring. This was up and down until 10:30 when it remained at 2 unhealthy tasks.

  • 10:02am Several Logs of issues retrieving user’s profile information

  • 10:30am User reported with TIS Reval with getting constant error messages (oops something went wrong)

  • 10:32am another user reported that system is very slow and my under notice list just produced same error message as above

  • 11:44am first responder reported issue was being investigated and likely now resolved ? Recommendation service has reached a steady state.

  • 15:26pm User reported all working fine now.

Root Cause

  • The Ooops message was being displayed

  • The recommendation service was returning errors

  • Unhealthy tasks were being used

  • Tasks were repeatedly started and failing as part of a deployment

Action Items

There are a number of actions outstanding based on a similar occurrence which are yet to reach the top of the backlog

Recreate service

Action Items

Owner

Increased service resources (CPU): Tasks now start more quickly

Joseph (Pepe) Kelly

DONE

Additionally, dependant on service provider support responses, we will recreate the service

Cards are still outstanding for improving the observability of the services

...

Lessons Learned