...
Bulk upload service is continually restarted and bulk upload webpage is continually refreshing. Spreadsheet This meant that users were less able to submit and check their bulk uploads for a large part of Friday. Some users were able to submit and have their
On Thursday afternoon, 3 uploads of 1 or more spreadsheets had lots of rows that were blank other than a hyphen in the address field. Bulk upload treated these as rows that required processing so produced significant numbers of errors (see below). By temporarily allocating more resources to the service that processes uploads, it was able to cope with the additional pressure of letting users know about the number of errors.
...
Trigger
There were 3 super large file uploaded and completed with thousands of errors.
...
Action Items | Comments | Owner |
---|---|---|
Fix up service deployment configuration (volume mappings for logs & heap dump) Preferred: Move the service to ECS | Don’t know if ECS would make the heap dumps available | |
Improve memory use: Change what columns are retrieved from the database for the | ||
Analyse the data uploaded. | This would be to inform setting limits on the number of rows that are uploaded. | |
Get feedback from Local Office about what happened |
Lessons Learned
We noticed there were 3 large files at the first sight, but didn’t recognise them as the root cause in the very beginning. It was because the data received in the API response doesn’t contain the error messages.
But the backend service does load them from the DB.
If some thing looks unusual (too big!) on the UI, it’s probably the cause.