Bi-Di validation
Overview
During the bi-directional pilot we validated the updates to trainee details compared to what was loaded from ESR to ensure data was updated correctly and we had traceability of the changes. This was done with a predefined SQL query run against TCS before and after the bi-di file was processed. The audit service provided the data which was processed from ESR. These csv files were then loaded into neo 4j and a variety of pattern matching was used to identify data scenarios.
Data load
The SQL query used is stored in metabase (In our anayltics as “peninsular trainee details pilot”) and has two restrictions which should be modified if it is rerun. These are the date, which determines which placements are selected and the trusts which are applicable. This query was run and saved as a csv file before and after the file was loaded. With the before files saved a preload.csv and the after file being saved as postload.csv. The audit system was then accessed through the page (10.160.0.137/esraudit/search [This needs an SSH connection if outside of the office currently]) . We searched for the RMF file name and the outputted csv was saved as combinedAuditMessage.csv. these files were then put in the the import folder in neo4j and the following scripts were run in neo4j’s ui:
Data Query and pattern matching
The following patterns were searched for:
Unsuccessful update:
Here we have a differing data items in the final state (postload) and what we have processed from ESR (Audit) for one doctor (GMC). the query used for this is below:
match (k)-[j]-(x:Postload)-[]-(l:Audit)-[t]-(n) where type(j)=type(t) and type(j) <> "ETHNICITY" and n <> k return k,j,x,l,t,n
2. Wrong Source:
Here we have a doctor associated with one trust, but the update has come from a different trust. This suggests we are recieiving information from a source we should have. the query for this is:
match (k:ODS)-[j]-(x:Postload)-[]-(l:Audit)-[t]-(n:ODS) where type(j)=type(t) and n <> k return k,j,x,l,t,n
3. Conflicting information:
Here we have two of the same data type associated with one trainee from the information we have recieved from ESR. This means they are sending us conflicting information. The query for this is:
match (l:GMC:Audit)-[r]-(x) match (l:GMC:Audit)-[t]-(n) where type(r)=type(t) and n <> x return l,r,t,n,x
4. Unexpected update:
Here we have no connection of the GMC which we recieved from ESR to a trainee in TIS meaning that we have recieved an updated for a trainee which we weren’t expecting. the query for this is:
match (k)-[j]-(x:Postload)-[]-(l:Audit)-[t]-(n) where type(j)=type(t) and type(j) <> "ETHNICITY" and n <> k return k,j,x,l,t,n
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213