Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

It may only be down the weekly prod->stage synchronisation but the cdc logs (STAGE) showed the following errorIf you see this, panic (a little and then do something):

...

If it’s not the job prod-cdc (last part of the “Description”), no need to panic but do sort it out as stage/dev ESR and TIS will be drifting out of sync.

It helps to understand https://hee-tis.atlassian.net/wiki/spaces/NTCS/pages/1371275283/Change+Data+Capture+CDC#How . The first step is probably always going to be check the docker container on the machine listed.

If the logs contain something like the following, there is a problem processing a transaction:

Code Block
...
2020-06-16 11:14:39 ERROR Error on bin log position Position[BinlogPosition[mysql-bin.003382:68853804], lastHeartbeat=1592034510892]
2020-06-16 11:14:39 INFO  Binlog disconnected.
2020-06-16 11:14:44 WARN  Timed out waiting for heartbeat 1592306079319
2020-06-16 11:14:44 INFO  Stopping 4 tasks
...

Non-production resolution:The error line contains: BinlogPosition[mysql-bin.003382:68853804]

or in a generic format BinlogPosition[{binlog_file}:{binlog_position}]

To find (and skip over) a problem statement:

  1. Decode the logfile sudo mysqlbinlog --base64-output=decode -rows vv /var/log/mysql/{mysql-bin_file_from_error} > /tmp/someTempFile.sql (binlogs are only readable by mysql user & group).

  2. Search for the position by the number in the error. From that point in the file you can find the next position by searching forward for end_log_pos. You may need to search past a few; finding a suitable point such as the end of the transaction.

  3. Update the binlog_position field in maxwell.positions MySQL table to move where the cdc will try to resumethe start point for CDC. I initially tried the first but this was also a problem. It may be that anything in the same transaction will also be an issue.

Misc thoughts:

It’s not appropriate to skip transactions in production without taking lots of mitigating actions (i.e. what to do about the transactions that haven’t been forwarded).

The recurring issue on STAGE may be caused by the weekly prod->stage synchronisation but the cdc logs (STAGE) showed the following error.