...
STILL WIP
...
title | Initial Refinement informing Design work a.k.a. Connections (... and Reval) ~Syncing~ Cache Design |
---|
STILL WIP
Expand | ||
---|---|---|
| ||
Problems:
• Hypothesis: It would all be much simpler if we were copying all programme memberships separately
Principals:
Solutions:
Three ways of caching data, the first might not be viable because of connection specific info:
|
...
The approach of “pre-sorting” the data was also fine before as the exact same code was used for CDC and the ES Resync job. However, in order to repeat the massive time saving we achieved in
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Summary
Having multiple indexes makes GET requests simpler
performance has been raised as a potential benefit, but when more complex queries on large data sets take less than a second it’s questionable how much benefit this would really give.
It’s what we’ve got already 🤷♀️
Multiple indexes means duplicating data
Multiple indexes makes requires multiple updates for a single data change
Because we have separate CDC and Resync processes, and because the Java approach is prohibitively slow for the Resync process, we would have to write and maintain the business logic in separate places in separate languages
Tasks to complete TIS21-3774 with this approach
...
masterdoctorindex Fields | Required by Recommendations | Required by Connections |
---|---|---|
id | ✅ | ✅ |
tcsPersonId | ✅ | ✅ |
gmcReferenceNumber | ✅ | ✅ |
doctorFirstName | ✅ | ✅ |
doctorLastName | ✅ | ✅ |
submissionDate | ✅ | ✅ |
ProgrammeName | ✅ | ✅ |
membershipType | ✅ | ✅ |
designatedBody | ✅ | ✅ |
gmcStatus | ✅ | |
tisStatus | ✅ | |
admin | ✅ | |
lastupdatedDate | ✅ | |
underNotice | ✅ | |
tcsDesignatedBody | ✅ | |
programmeOwner | ✅ | |
curriculumEndDate | ✅ | |
connectionStatus | ✅ | |
membershipStartDate | ✅ | |
membershipEndDate | ✅ | |
existsInGmc | ✅ | |
exceptionReason* *this field is currently in the code in connections but doesn’t exist in masterdoctorindex or the integration service, it’s either not required or has , appears to have been overlooked | ✅ |
As we can see, both services share a lot of fields, so this could be motivation for either:
...
This should be a fairly straightforward conversion, for example where we currently pre-sort with Java if(<conditions for discrepancy>)
we would instead GET with a Where <fieldValue> = <condition for discrepancy>
in Elasticsearch.
Summary
Generally simplifies the system architecture
This approach means only need to implement the business logic in one place in one language
A single index means we’re not duplicating data unnecessarily and simplifies the update process
Removing the “pre-sort” step greatly simplifies the CDC and Resync processes and makes it more consistent with how we do Recommendations
GET requests become more complicated than in the current approach
Although no more complicated than what we have on Recommendations, and implementing filters becomes more consistent and straightforward
Is there a business logic case we couldn’t replicate using a query language as opposed to Java?
Tasks to complete TIS21-3774 with this approach
Copy implementation for reindexing used by recommendations
Replace Connected, Disconnected and Exception repositories with single Connection repositiory
Delete code for pre-sorting the doctors (ElasticsearchIndexUpdateHelper)
Have CDC listener update new index directly (see step 3)
Write ES query to GET Connected Doctors (based on currently implemented logic only)
Write ES query to GET Discrepancies (based on currently implemented logic only)
Alternative Approach 2 - While we’re at it, single Reval index?
...
query to GET Discrepancies (based on currently implemented logic only)
Alternative Approach 2 - While we’re at it, single Reval index?
Drawio | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Summary
(all the advantages and disadvantages of the single connection index approach, and:)
Massively simplifies the system architecture
A single index means we’re not duplicating data unnecessarily and simplifies the update process
Doesn’t save any significant impact on the speed of the sync process (reindex is really quick!)
Awkward request design, either having to implement API call methods for different services in the Integration service, or having to make extra “back and forth” requests between services - less “separation of concerns”?
Less flexible if we have different filtering requirements for the same fields in different services (when calling reindex, we can specify field mapping metadata that enables different search behaviour e.g. wildcard)
Tasks to complete TIS21-3774 with this approach
Remove reindex steps completely
Replace all ES implementation in Connections and Recommendations with a new implementation ported to Integration
Delete code for the above from affected service
Write all new ES queries for each requirement of each service