It may only be down the weekly prod->stage synchronisation but the cdc logs (STAGE) showed the following errorIf you see this, panic (a little and then do something):
...
If it’s not the job prod-cdc (last part of the “Description”), no need to panic but do sort it out as stage/dev ESR and TIS will be drifting out of sync.
It helps to understand https://hee-tis.atlassian.net/wiki/spaces/NTCS/pages/1371275283/Change+Data+Capture+CDC#How . The first step is probably always going to be check the docker container on the machine listed.
If the logs contain something like the following, there is a problem processing a transaction:
Code Block |
---|
... 2020-06-16 11:14:39 ERROR Error on bin log position Position[BinlogPosition[mysql-bin.003382:68853804], lastHeartbeat=1592034510892] 2020-06-16 11:14:39 INFO Binlog disconnected. 2020-06-16 11:14:44 WARN Timed out waiting for heartbeat 1592306079319 2020-06-16 11:14:44 INFO Stopping 4 tasks ... |
Non-production resolution:The error line contains: BinlogPosition[mysql-bin.003382:68853804]
or in a generic format BinlogPosition[{binlog_file}:{binlog_position}]
To find (and skip over) a problem statement:
Decode the logfile
sudo mysqlbinlog --base64-output=decode -rows vv /var/log/mysql/{mysql-bin_file_from_error} > /tmp/someTempFile.sql
(binlogs are only readable by mysql user & group).Search for the position by the number in the error. From that point in the file you can find the next position by searching forward for
end_log_pos
. You may need to search past a few; finding a suitable point such as the end of the transaction.Update the
binlog_position
field inmaxwell
.positions
MySQL table to move where the cdc will try to resumethe start point for CDC. I initially tried the first but this was also a problem. It may be that anything in the same transaction will also be an issue.
Misc thoughts:
It’s not appropriate for to skip transactions in production without taking lots of mitigating actions (i.e. what to do about the transactions that haven’t been forwarded).
The recurring issue on STAGE may be caused by the weekly prod->stage synchronisation but the cdc logs (STAGE) showed the following error.