STILL WIP
Expand | ||
---|---|---|
| ||
Problems:
• Hypothesis: It would all be much simpler if we were copying all programme memberships separately
Principals:
Solutions:
Three ways of caching data, the first might not be viable because of connection specific info:
|
...
As mentioned previously, each of tabs on the Connections list page loads its data from a separate index, and each of these indexes is a subset of masterdoctorindex, sorted according to business logic. The original design was born out of the fact that at the time there was very little knowledge about how to write Elasticsearch queries within the team. To make the GET queries simpler we implemented the business logic about which tab each doctor should appear in “ahead of time”, i.e. we calculate which doctor should be shown in which tab during the CDC or Resync operations, not when we fetch the data. This has worked fine, but is roughly the equivalent of making 3 tables with identical tables schemas in a MySQL database in order avoid writing a WHERE clause.
...
The approach of “pre-sorting” the data was also fine before as the exact same code was used for CDC and the ES Resync job. However, in order to repeat the massive time saving we achieved in
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
In summary:
Having multiple indexes makes GET requests simpler
performance has been raised as a potential benefit, but when more complex queries on large data sets take less than a second it’s questionable how much benefit this would really give.
Multiple indexes means duplicating data
Multiple indexes makes requires multiple updates for a single data change
Because we have separate CDC and Resync processes, and because the Java approach is prohibitively slow for the Resync process, we would have to write and maintain the business logic in separate places in separate languages
Alternative Approach 1 - Single Connection index
...