Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Date

Authors

Joseph (Pepe) Kelly Yafang Deng

Status

TIS21-4449 - Getting issue details... STATUS

Summary

Bulk upload page is continually refreshing and showing “the server took too long to respond“.

Impact

Users can not use Bulk upload as usual.

Sometimes when bulk upload service is up for a short period of time, users are able to upload the file. Once the service is restarting, it can miss the response from other services and the file is stalled in progress.

Non-technical Description

Bulk upload service is continually restarted and bulk upload webpage is continually refreshing.


Trigger

  • Threre were 2 super large file uploaded and completed with thousands of errors.


Detection

  • User queries:

Liam Lofthouse: Morning General - is there an issue with the bulk upload page? I seem… 

posted in TIS Support Channel / General at 21 April 2023 09:28:59


Resolution


Timeline

BST unless otherwise stated

  • 14:03-14:36 - Several uploads with significant numbers of errors.

  • 09:28 - User report that page keeps refreshing

  • 09:30 - Found service was running OutOfMemory. Logs didn’t give indication why. Dashboard indicated that there *may* have been an issue since the previous day.

  • 09:30-13:00 - Made hotfix changes to deployment configuration to capture additional information. Found a valid cause for the additional memory use. We modified the configuration to give it more resource but missed additional resource constraints.

  • 13:49 User report that an uploaded file was stalled. We assumed this was because the service restarted when the file was being processed.

  • 14:40-15:40 - Modified and monitored the service further to check that it was stable, notifying the Teams channel at 15:55.

  • 16:15 - Got the same issue replicated on Stage after manually uploading those 3 large files (logId: 1682001412239, 1681999967423, 1681999429355 on Prod).

  • 16:30 - Modified ApplicationType.errorJson column for uploaded 3 large files on Stage to a single error. And checked Stage was resolved.


Root Cause(s)


Action Items

Action Items

Comments

Owner

Fix up service deployment configuration

Preferred: Move the

Lessons Learned

  • No labels