If you see this, panic (a little and then do something):
If it’s not the job prod-cdc (last part of the “Description”), no need to panic but do sort it out as stage/dev ESR and TIS will be drifting out of sync.
It helps to understand https://hee-tis.atlassian.net/wiki/spaces/NTCS/pages/1371275283/Change+Data+Capture+CDC#How . The first step is probably always going to be check the docker container on the machine listed.
If the logs contain something like the following, there is a problem processing a transaction:
... 2020-06-16 11:14:39 ERROR Error on bin log position Position[BinlogPosition[mysql-bin.003382:68853804], lastHeartbeat=1592034510892] 2020-06-16 11:14:39 INFO Binlog disconnected. 2020-06-16 11:14:44 WARN Timed out waiting for heartbeat 1592306079319 2020-06-16 11:14:44 INFO Stopping 4 tasks ...
The error line contains: BinlogPosition[mysql-bin.003382:68853804]
or in a generic format BinlogPosition[{binlog_file}:{binlog_position}]
To find (and skip over) a problem statement:
Decode the logfile
sudo mysqlbinlog --base64-output=decode -vv /var/log/mysql/{mysql-bin_file_from_error} > /tmp/someTempFile.sql
.Search for the position by the number in the error. From that point in the file you can find the next position by searching forward for
end_log_pos
. You may need to search past a few; finding a suitable point such as the end of the transaction.Update the
binlog_position
field inmaxwell
.positions
to move where the CDC will try to resume. I initially tried the first but this was also a problem. It may be that anything in the same transaction will also be an issue.
Misc thoughts:
It’s not appropriate for production without taking lots of mitigating actions (i.e. what to do about the transactions that haven’t been forwarded).
The issue on STAGE may be caused by the weekly prod->stage synchronisation but the cdc logs (STAGE) showed the following error.
0 Comments