2023-05-04 Bulk Upload not processing jobs

Date

May 4, 2023

Authors

@Joseph (Pepe) Kelly @Yafang Deng

Status

Done

Summary

https://hee-tis.atlassian.net/browse/TIS21-4513

There was an issue processing a Placement Create bulk upload file and a number of other files were not processed until the service was restarted

Impact

Users uploaded files were not processed.

We restarted the service to “skip over” the file that was problematic.

Non-technical Description

 


Trigger

  •  


Detection

  • User queries:

Teams Support channel


Resolution

  • Restarted service

  • After testing the file in the stage environment, found that it was processed there.

  • Tested the file on local, it took more than 2 hours:


Timeline

BST unless otherwise stated

  • May 4, 2023 13:55 - File uploaded and starts being processed.

  • May 4, 2023 13:55-15:24 - Other users upload files and the large file was being processed

  • May 4, 2023 15:24 - Error logged

  • May 4, 2023 15:25 - User report on Teams

  • May 4, 2023 15:10-15:37 - One of the copies of TCS, which bulk upload relies on was restarted and was busier.

  • May 4, 2023 15:24-15:40 - Service monitored for signs that data was still being processed and finding no indication the file was being processed, the service was restarted

  • May 4, 2023 15:53 - Service processes queued files

  • May 4, 2023 17:29 - Admin user tried uploading the same Placement Create file again, and then it was processed successfully

 


Root Cause(s)

  • When admin users raised the query, the job has already spent 1.5 hours. (12:55:51 UTC - 14:25 UTC)

  • We thought the job was stalled, but it was not. Until generic upload service was restarted, the job had been processing.

  • Below image shows the record of 1702nd row in the spreadsheet. And there’re 1788 rows in total.


Action Items

Action Items

Comments

Owner

Action Items

Comments

Owner

 

 

 

 

 

 

Lessons Learned

  • Look for logs to check if the job is really stalled… pair up whenever possible.