2019-01-30 ESR ETL problems (2 of 5 processes)
Date | |
Authors | John Simmons (Deactivated) |
Status | ETL stabilised |
Summary | 2 of 5 processes errored. |
Impact | Need to manually copy files generated by our ESR ETL and move them to their `FTP In` folder in Azure |
Impact
- Information between ESR and TIS was not being shared. Therefore no data was being updated.
Root Causes
- The Azure CLI on the N3-Bridge server had timed out the user login.
Trigger
- The new monitoring and alerting system for the ESR ETL picked up the failures and sent alerts into the esr-operations slack group.
Resolution
- After the root cause has been found, we logged back into the az cli and it started to work. After 24 hours we found that some of the routines were not working. Further investigation found that the az cli didnt like the syntax of the command that was being run.
Detection / Timeline
- .
Action Items
- Change how the ftp script works. move it away from logins to a service principal
- see if we can find a way to avoid the N3 bridge completely.
Lessons Learned
- The current implementation of the esr-ftp-sync is horrible, a better solution needs to be found. Further talks with IT regarding the underlying networking will be needed.
What went well
- Ashley's knowledge of the whole esr system was invaluable to work out what was happening and when. also his ability to make the correct amendments to the tis database to make sure that the missed items were sent to the esr out folder.
What went wrong
- Not enough knowledge of how the ESR system functioned from an ops perspective when the system failed.
Where we got lucky
- .
Supporting information
.
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213