POC - Pentaho Data Integration
In this POC, we have tried to extract historical revalidation data from CSV and load it to MongoDB with some simple fields mapping and table joining.
Steps:
Download and install Pentaho Data Integration
Create a new transformation
3. In the “Design” tab, choose an input and drag it to the right board. We choose CSV as an example here.
4. Double click the icon to open the setting page for file input.
Update the attributes according to the format or the CSV file, then click “Get Fields”. The fields would be shown in the bottom table.
5. Drag “MongoDB output“ from “Big Data”
6. Click on the “Output connector” icon to create a connection from CSV input to the MongoDB output
7. Double click on the “MongoDB output” icon for the setup page
Update the MongoDB connection and output options
8. Set up Mongo document fields
If it is connected to the input properly, fields can be added automatically by click “Get fields”.
Edit the fields and map the fields with the target field names.
9. Click to run the transformation
10. Records would be loaded to the MongoDB
Joining Tables
The Revalidation table does not include the GMC number that we want. So we need to join it with the gmcdetails table with the tcsId
Steps:
Add one more data input for GmcDetails
Add “Merge join” for the two tables.
Add the key field for joining. We can choose “Inner”, “Left outer”, “Right outer”, or “Full outer” as join type
3. Connect it to the MongoDB output
4. Run it!
5. Records will be loaded to MongoDB of table joined with (revalidation.tisId = gmcDetails.id).
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213