Date |
|
Authors | Andy Nash (Unlicensed) Joseph (Pepe) Kelly John Simmons (Deactivated) Andy Dingley Simon Meredith (Unlicensed) Paul Hoang (Unlicensed) |
StatusIn progress | LiveDefect resolved. Actions being ticketed up (Andy Nash (Unlicensed)) |
Summary | On Friday evening we saw Jenkins struggling, and then fell over, subsequently causing ETL and data related issues. Elasticsearch, RabbitMQ and MongoDB then also fell over between Friday and Saturday |
Impact | No Stage. No Prod. No data syncing in various places |
...
Everything fell over | Comments following catch up 2020-10-21 | |
---|---|---|
|
| Jenkins could to with some TLC. ES went down too (before Jenkins). However, there is an underlying OS update issue to be investigated and confirmed… |
Initial discussion, along with short and longer term actionsWhat can we do about Dependabot creating and building simultaneously? Dependabot does run sequentially, but much faster than Jenkins can process things so everything appears concurrent.
ESR preoccupied with launching New World, understandably! Can perm team keep on top of ESR stuff when they leave? Even when keeping on top of things, will it eventually be too much anyway? Original Jenkins build was never designed to handle this much load - underlying architecture isn’t there for the level of automation we now have. It is designed for a single node, not load-balancing Is Jenkins the right tool for everything it’s being asked to do? No:
|
...