Release approach
Status | Decided |
---|---|
Decision leader | @Andy Nash (Unlicensed) |
Contributors | @Dev team, Chris Mills and PO group |
Date | Oct 11, 2018 |
Outcome | To transition from the current weekly release train to a CI/CD approach (many multiple timely individual releases throughout the Sprint) |
Background
1. Where we were:
We had originally no release train, occasional releases on an ad-hoc basis. Devs deployed locally functioning code, through manual steps, till the code eventually reached Stage, where E2E tests were kicked off and POs invited to approve a release of multiple bits of code changes.
We moved to a weekly release train - to Stage - on a Friday (sometimes) which resulted in a release - to Prod - on a Tuesday (sometimes).
The approach was very much a transitional one in advance of a move towards full CI/CD.
This necessitated lots of manual testing which wasn't fool proof anyway. And a waterfall-style hand-off to Shivani and POs for testing/approval.
There was a very slow pace of development.
The existence of many repos causes difficulties for Devs working across multiple services for one ticket/feature
2. Where we are now:
Small incremental releases that can be done multiple times per day.
There is the potential to fail fast (each incremental release is of a tiny amount of code, and therefore represent tiny amount of risk) and roll back (with each release being tiny, roll-back is comparatively simple. Roll-back itself is a functional that never previously existed and was a fiddly manual Dev task).
Time spent on manual testing of a weekly/two weekly release can now be spent getting the failing ~40 or so E2E tests fixed, get other tests up to scratch, and invest time in newer testing to increase the confidence levels for all future releases.
There is more responsibility on developers to write good code which should result in better code.
Pace of development will increase as deployment is not held up and subject to so much of a dependency queue.
3. Where we are going:
Team will build up a dependency graph with a view to then taking steps to remove dependencies over time.
Moving to a mono-repo, allowing work for one ticket/feature to sit in one branch. This will be easier to PR. Additionally it will reduce the incidence of back end / front end code being deployed in isolation of the other, meaning Jira Stories are released, rather than individual Sub-tasks.
Integrating notifications of failure into Slack.
Detect errors in logs. Once being detected, they can be tracked. With that information the team can make decisions as to whether to roll-back, or fix and deploy
All services except Reval, Generic upload, NDW and ESR are set up with the new pipeline. Devs themselves can apply new pipeline to these services themselves (as a good test/practice - especially for those Devs that haven't done this sort of thing before)
Action items (my suggested actions and owners)
Related pages
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213