Background
1. Where we were:
We had originally no release train, occasional releases on an ad-hoc basis. Devs deployed locally functioning code, through manual steps, till the code eventually reached Stage, where E2E tests were kicked off and POs invited to approve a release of multiple bits of code changes.
We moved to a weekly release train - to Stage - on a Friday (sometimes) which resulted in a release - to Prod - on a Tuesday (sometimes).
The approach was very much a transitional one in advance of a move towards full CI/CD.
This necessitated lots of manual testing which wasn't foolproof anyway. And a waterfall-style hand-off to Shivani and POs for testing/approval.
There was a very slow pace of development.
The existence of many repos causes difficulties for Devs working across multiple services for one ticket/feature
2. Where we are now:
Small incremental releases that can be done multiple times per day.
There is the potential to fail fast (each incremental release is of a tiny amount of code, and therefore represent tiny amount of risk) and roll back (with each release being tiny, roll-back is comparatively simple. Roll-back itself is a functional that never previously existed and was a fiddly manual Dev task).
Time spent on manual testing of a weekly/two weekly release can now be spent getting the failing ~40 or so E2E tests fixed, get other tests up to scratch, and invest time in newer testing to increase the confidence levels for all future releases.
There is more responsibility on developers to write good code which should result in better code.
Pace of development will increase as deployment is not held up and subject to so much of a dependency queue.
3. Where we are going:
Team will build up a dependency graph with a view to then taking steps to remove dependencies over time.
Moving to a mono-repo, allowing work for one ticket/feature to sit in one branch. This will be easier to PR. Additionally it will reduce the incidence of back end / front end code being deployed in isolation of the other, meaning Jira Stories are released, rather than individual Sub-tasks.
Integrating notifications of failure into Slack.
Detect errors in logs. Once being detected, they can be tracked. With that information the team can make decisions as to whether to roll-back, or fix and deploy
All services except Reval, Generic upload, NDW and ESR are set up with the new pipeline. Devs themselves can apply new pipeline to these services themselves (as a good test/practice - especially for those Devs that haven't done this sort of thing before)
Action items (my suggested actions and owners)
- Panos Paralakis (Unlicensed) : Lead on build up a dependency graph (using POMs and REST calls)
- Simon Meredith (Unlicensed) : Lead on coordinating the move to a mono-repo in Git (can you delegate to someone for your week off, next week, please?)
- Chris Mills (Unlicensed) : Work on integrating failures to a Slack channel (presuming this will require a new Slack channel)
- Chris Mills (Unlicensed) : Work on detecting errors in logs such that they can be tracked and the Dev team / POs can determine whether / when roll-backs need to be initiated
- Oladimeji Onalaja (Unlicensed) : Lead on coordinating getting new pipeline applied to those remaining services (Generic upload, ESR, Reval - others? Chris Mills (Unlicensed) are you able to confirm, please?)
- Shivani Rana (Unlicensed) : Lead on addressing the failing E2E tests. The sooner these can be fixed, the sooner we can then use a correctly failing E2E test as a blocker for build
- Chris Mills (Unlicensed) : Can you add links here for: How to roll-back | (team: is there anything else you want Chris to link to here o clarify the new release approach?)
- Whole team : Work out as soon as possible, an approach to clearing the releases currently awaiting approval, to take away any interference for the team using the new approach
0 Comments