Extract data from MongoDB ReplicaSet
In the ESR BiDirectional pilot, we had to extract some data from the Stage MongoDB ReplicaSet (cluster). At that time (7th April 2020), we didn’t have a web based MongoDB console for these ReplicaSets. Mongo Express doesn’t seem to be available for MongoDB ReplicaSets. This is how I managed to extract the data.
The MongoDB Query
The MongoDB query I wanted to run was straightforward
‘give me the full audit messages where the message_id is contained within this list’.
Translated into mongo - this looks like this:
db.auditmessage.find({'messageProperties.messageId':{$in:["<ID1>","<ID2>",..."<IDN>"]}});
If there are a handful of IDS, you could enter the query by hand or paste it into the mongo console. I had to run the query for ~1800 message ids. The approach I took was to write a shell script to create a Javascript file containing the ID list and the query that uses it. The Mongo console understands basic Javascript.
The details of the shell script are not important, they were very specific to what I was doing. The Javascript file containing the ID list and the query was called 'extractMessagesById.js
' and its contents are of the form:
var MESSAGE_IDS=["b7097d15-3a92-4f3a-b32d-be1a0d77fe1d","ecb8da04-953f-4eea-9dc7-50625e0464b3"];
printjson(db.auditmessage.find({'messageProperties.messageId':{$in:MESSAGE_IDS}}).pretty().toArray());
Note: this script just contains 2 message Ids, the one I created via shell script had ~1800 message Ids.
Now we have a Javascript file containing the query we want to run, we have to get it to execute against the Stage MongoDB ReplicaSet.
Running the MongoDB Query
The Stage MongoDB ReplicaSet runs on the box ‘10.160.0.151
' - If you ssh into that box you should see the prompt change to 'heetis@HEE-TIS-VM-STAGE-MONGO-DB-REPLICASET:~$’
.
You are now on the VM that is running the 3 mongo replicaset nodes inside docker containers - they are on the docker network ‘mongodb-replicaset_mongo_network
’.
heetis@HEE-TIS-VM-STAGE-MONGO-DB-REPLICASET:~$ docker network inspect mongodb-replicaset_mongo_network | grep Name
"Name": "mongodb-replicaset_mongo_network",
"Name": "mongo3",
"Name": "mongo1",
"Name": "mongo2",
I used sftp to transfer the Javascript query 'extractMessagesById.js
' to '/home/heetis/davidhay/script/extractMessagesById.js
' in the 'STAGE MONGO DB REPLICASET' VM.
I then changed my VM working directory : 'cd /home/heetis/davidhay
'.
The VM does NOT have mongo console installed, so we’re going to use docker to run the mongo console, connect to the Mongo DB replicaset, run the query. We can then capture the query output into a text file.
The command (with the password omitted) is ….
Note: we mount the query file inside the container in '/home/scripts/extractMessagesById.js
' so mongo console (running in the container) can read it.
You are now free to transfer OUTPUT.TXT back your development box for analysis!
Summary
I’ve shown one (complex?) way to extract data from a mongo replicaset cluster. There must be better ways to do it than this, but as least it works and should work for any extract.
Mongo Console
If you are logged into the 'STAGE MONGO DB REPLICASET' VM, and you just want to enter the mongo console, to run ad hoc queries, the command with the password omitted is:
If you can see the mongo console prompt change to ‘rs0:PRIMARY>
’, you have logged into the Mongo console successfully.
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213