Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

John Simmons (Deactivated)

Status

Documenting

Summary

Reference service failed to start after upgrade: https://hee-tis.atlassian.net/browse/TIS21-2573

Impact

some Some users were having problems accessing TIS for ~ 10 minutes

...

  • The offending component in the reference service was reverted back to the previously known working version and re deployedredeployed

...

Timeline

  • 13:14 - Alert on Slack: AWS Service 10.160.0.137:8088 is down

  • 13:21 - Docker reports reference container is unhealthy and boot looping (syslog: 2022-01-18 13:21:44.810 WARN 1 --- [ main] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'inMemorySwaggerResourcesProvider' defined in URL [jar:file:/app.jar!/WEB-INF/lib/springfox-swagger-common-3.0.0.jar!/springfox/documentation/swagger/web/InMemorySwaggerResourcesProvider.class]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [springfox.documentation.swagger.web.InMemorySwaggerResourcesProvider]: Constructor threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'swaggerSpringfoxApiDocket' defined in class path resource [io/github/jhipster/config/apidoc/SwaggerConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [springfox.documentation.spring.web.plugins.Docket]: Factory method 'swaggerSpringfoxApiDocket' threw exception; nested exception is java.lang.NoSuchMethodError: springfox.documentation.builders.PathSelectors.regex(Ljava/lang/String;)Lcom/google/common/base/Predicate;")

  • 13:24 - Andy Dingley creates new PR to revert to known working state

  • ~13:24 - John Simmons (Deactivated) approved PR

  • 13:24 - Jenkins started building repaired version

  • 13:27 - Fixed version starts on stage environment and is checked and approved

  • 13:28 - New version deployed to production

  • 13:29 - Fault is fixed in prod and everything is now working as it should be.

...

Action Items

...

Lessons Learned

  •  add  Add some health check monitoring to the pipeline to stop rebooting containers from reaching the production environment.