...
Non-technical Description
Trainee Self-Service is made up of a number of distinct software components. These are hosted in the Cloud, in containers (AWS ECS). Containers are virtual machines, with associated resources such as memory and CPU. When the develops make changes to a component, it is built and redeployed into a container, which must have enough resources to run the component successfully. We found some components became unable to be deployed into their normal containers: we suspect that some underlying changes to the build pack that is used to assemble the components meant they required more memory than previously, meaning that the containers were no longer large enough to accommodate them, even though we had made no material changes to the component code.
...
Trigger
Deployment of run-of-the-mill component updates (and subsequent reversion to known-good builds) failed
...
: 14:39 - Failed deployment for tis-trainee-ndw-exporter reported in #notifications-deployments Slack channel, closely followed by failed deployments for tis-trainee-credentials and tis-trainee-user-management.
: 16:13 - Reverts to the updates to these components also fail to deploy.
: 11:25 - Redeploy of tis-trainee-ndw-exporter attempted with ECS task configured with 1GB memory instead of 512MB succeeded
: 13:00 - Redeployment of tis-trainee-credentials and tis-trainee-user-management with 1GB memory succeeded
Root Cause(s)
...
Deployments to preprod environment failing for some components (though not for others, e.g. tis-trainee-sync)
Logs for the failing components include lines such as:
6/13/2023, 4:51:44 PM GMT+1 [31;1mERROR: [0mfailed to launch: exec.d: failed to execute exec.d file at path '/layers/paketo-buildpacks_bellsoft-liberica/helper/exec.d/memory-calculator': exit status 1
6/13/2023, 4:51:44 PM GMT+1 Calculating JVM memory based on 616300K available memory
6/13/2023, 4:51:44 PM GMT+1 For more information on this calculation, see https://paketo.io/docs/reference/java-reference/#memory-calculator
6/13/2023, 4:51:44 PM GMT+1 unable to calculate memory configuration
6/13/2023, 4:51:44 PM GMT+1 fixed memory regions require 632382K which is greater than 616300K available for allocation: -XX:MaxDirectMemorySize=10M, -XX:MaxMetaspaceSize=120382K, -XX:ReservedCodeCacheSize=240M, -Xss1M * 250 threads
Some buildpack versions have changed:
(ndw exporter)
Pass:
paketo-buildpacks/ca-certificates 3.6.2
paketo-buildpacks/bellsoft-liberica 10.2.3
paketo-buildpacks/syft 1.30.1
paketo-buildpacks/executable-jar 6.7.3
paketo-buildpacks/dist-zip 5.6.2
paketo-buildpacks/spring-boot 5.25.1
Fail:
paketo-buildpacks/ca-certificates 3.6.2
paketo-buildpacks/bellsoft-liberica 10.2.4
paketo-buildpacks/syft 1.31.0
paketo-buildpacks/executable-jar 6.7.3
paketo-buildpacks/dist-zip 5.6.3
paketo-buildpacks/spring-boot 5.25.2
Further investigation needed to determine if these version changes affected memory requirements or calculation.
...
Action Items
Action Items | Owner | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Write-up investigation ticket for managing buildpack versioning | TO DO | ||||||||||
PRs for increased memory allocation for affected components (at least 3, but possibly note that all with small containers might throw errors when we attempt to deploy updates without bumping up the memory allocation) |
| TO DOReuben Roberts | DONE |
...
Lessons Learned
TODO