TSS: Find and Delete orphaned Placements/Programme Memberships

About

This explains the process for identifying and deleting placement or programme memberships which still exist in TIS Self-Service after having been deleted from TIS. It is unoptimised, requires several manual steps and includes workarounds that may have better alternatives.

But, it does the job in a pinch.

Steps

1. Get a list of IDs from TIS

A simple SQL query can be used, with the results exported to a CSV file, this must be done against the DB directly as Metabase limits the export to 1,048,576 rows.

Placement

SELECT traineeId, id AS tisId
FROM tcs.Placement
ORDER BY traineeId ASC, placementId ASC

Programme Membership

SELECT personId AS traineeId, uuid AS tisId
FROM tcs.ProgrammeMembership
ORDER BY traineeId ASC, programmeMembershipId ASC

2. Get a list of IDs from TSS

This is a little trickier to get, a Mongo shell script could be used to simplify it but this approach uses Metabase instead. Use the following query and export the result to CSV named metabase_tss_ids.csv.

Placement

[{"$project":{"_id": 0, "traineeId":"$traineeTisId","tisId":"$placements.tisId"}}]

Programme Membership

[{"$project":{"_id": 0, "traineeId":"$traineeTisId","tisId":"$programmeMemberships.tisId"}}]

You’ll get a CSV with the IDs in an array.

traineeId,tisId
315594,"[""ae5350af-2a3c-4439-b118-f57ea1d8bc94""]"
140114,"[""e92a313c-d83d-11ec-9eb2-0638a616fc76"" ""e92a3259-d83d-11ec-9eb2-0638a616fc76""]"

The following script will expand the array in to individual rows, change the input filename as needed and send the output to a new CSV file ./my_script_file.sh > expanded_tss_ids.csv

#! /bin/bash
echo "traineeId,tisId"
while IFS="," read -r traineeId tisIds
do

  tisIds=${tisIds//[\[\]]/''}
  tisIds=${tisIds//[\"]/''}
  tisIds=${tisIds//[ ]/','}

  IFS="," read -ra array <<< "$tisIds"

  for tisId in "${array[@]}"
  do
    echo "$traineeId,$tisId"
  done
done < <(tail -n +2 metabase_tss_ids.csv)

3. Import to SQL database

Spin up a local MySQL instance (or use the TSS local DB from the handbook) and create two tables, one for the IDs from TIS and the other for the IDs from TSS.

Placement

CREATE TABLE `tcs_placement_ids` (
  `traineeId` int NOT NULL,
  `tisId` int NOT NULL
);

CREATE TABLE `tss_placement_ids` (
  `traineeId` int NOT NULL,
  `tisId` int NOT NULL
);

Programme Membership

CREATE TABLE `tcs_pm_ids` (
  `traineeId` int NOT NULL,
  `programmeMembershipId` varchar(36) NOT NULL
);

CREATE TABLE `tss_pm_ids` (
  `traineeId` int NOT NULL,
  `programmeMembershipId` varchar(36) NOT NULL
);

Then import the previously exported CSVs in to the appropriate tables.

4. Query for Orphans

Run the query to identify orphaned placements/PM

Placement

SELECT traineeId, tisId FROM (
  SELECT tss.traineeId, tss.tisId
  FROM tss_placement_ids tss
  WHERE tss.traineeId > 0 AND tss.tisId NOT IN (SELECT tisId FROM tcs_placement_ids tcs)
  GROUP BY tss.traineeId, tss.tisId
)
AS distinct_results ORDER BY traineeId ASC, tisId ASC;

Programme Membership

SELECT traineeId, tisId FROM (
  SELECT tss.traineeId, tss.tisId
  FROM tss_pm_ids tss
  WHERE tss.traineeId > 0 AND tss.tisId NOT IN (SELECT tisId FROM tcs_pm_ids tcs)
  GROUP BY tss.traineeId, tss.tisId
)
AS distinct_results ORDER BY traineeId ASC, tisId ASC;

Export the result to a new CSV named orphaned_ids.csv

5. (Risky Approach) Publish Messages

At the time of writing the below scripts can be used to publish suitable message for Placement and PM deletions, check that the structure is still valid before use and the queue name is correct.

An optional, but recommended, buffer queue can be used if publishing directly to the live queue is undesirable

Placement

#! /bin/bash

while IFS="," read -r traineeId tisId
do
  trimmedTisId=$(echo $tisId| xargs)
  trimmedTraineeId=$(echo $traineeId | xargs)
  aws sqs send-message --region eu-west-2 --queue-url https://sqs.eu-west-2.amazonaws.com/430723991443/tis-trainee-sync-prod-record.fifo --message-group-id corrective_delete --message-body '{"data":{"id":"'"$trimmedTisId"'","traineeId":"'"$trimmedTraineeId"'"},"metadata":{"record-type":"data","partition-key-type":"schema-table","schema-name":"tcs","table-name":"Placement","operation":"delete"}}'
done < <(tail -n +2 orphaned_ids.csv)

Programme Membership

#! /bin/bash

while IFS="," read -r traineeId tisId
do
  trimmedTisId=$(echo $tisId | xargs)
  trimmedTraineeId=$(echo $traineeId | xargs)
  aws sqs send-message --region eu-west-2 --queue-url https://sqs.eu-west-2.amazonaws.com/430723991443/tis-trainee-sync-prod-record.fifo --message-group-id corrective_delete --message-body '{"data":{"uuid":"'"$trimmedTisId"'","personId":"'"$trimmedTraineeId"'"},"metadata":{"record-type":"data","partition-key-type":"schema-table","schema-name":"tcs","table-name":"ProgrammeMembership","operation":"delete"}}'
done < <(tail -n +2 orphaned_ids.csv)

5. (Safer Approach) Publish Messages Using “Buffer” Queue

A temporary queue can be created as a target for the publish, this allows all messages to be collected and verified before they are sent to the live system. Publishing from a local machine is a slow process, so it’s recommended to use the buffer queue to avoid any issues if the process is interrupted and has to be restarted.

Create a queue in SQS (since this is temporary creating it in the console is fine, just clean up when you’ve done!) and use the same script with your temp queue name instead.

Placement

#! /bin/bash

while IFS="," read -r traineeId tisId
do
  trimmedTisId=$(echo $tisId| xargs)
  trimmedTraineeId=$(echo $traineeId | xargs)
  aws sqs send-message --region eu-west-2 --queue-url https://sqs.eu-west-2.amazonaws.com/430723991443/temp_my-buffer-queue.fifo --message-group-id corrective_delete --message-body '{"data":{"id":"'"$trimmedTisId"'","traineeId":"'"$trimmedTraineeId"'"},"metadata":{"record-type":"data","partition-key-type":"schema-table","schema-name":"tcs","table-name":"Placement","operation":"delete"}}'
done < <(tail -n +2 orphaned_ids.csv)

Programme Membership

#! /bin/bash

while IFS="," read -r traineeId tisId
do
  trimmedTisId=$(echo $tisId | xargs)
  trimmedTraineeId=$(echo $traineeId | xargs)
  aws sqs send-message --region eu-west-2 --queue-url https://sqs.eu-west-2.amazonaws.com/430723991443/temp_my-buffer-queue.fifo --message-group-id corrective_delete --message-body '{"data":{"uuid":"'"$trimmedTisId"'","personId":"'"$trimmedTraineeId"'"},"metadata":{"record-type":"data","partition-key-type":"schema-table","schema-name":"tcs","table-name":"ProgrammeMembership","operation":"delete"}}'
done < <(tail -n +2 orphaned_ids.csv)

Once messages have been published you can use the console to receive messages to verify them.

To send them to the live queue we’ll use some DLQ trickery, setting the “target” live queue as the DLQ for our temporary queue. Then using receives to trigger the messages to be sent to the “DLQ”.

When you are ready to send the messages to the real queue, set the following in the queue’s config

Visibility timeout: 1 second
DLQ:
- Max receives: 1
- Queue ARN: your target queue

In the “Send and receive messages” section of the queue in the AWS console set the polling settings

Poll duration: 90 seconds
Maximum message count: 999

This combinations of settings will ensure that we receives as many messages as quickly as possible, after each receive the message will instantly be sent to the “DLQ”.

Now just click “Poll for messages” and watch as 999 messages are sent to your target queue, repeat as needed until the buffer queue is empty.