...
Users sometime could not see grade name or site name on Person-Placement list.
...
Trigger
TCS to MySQL connection was brokenTask did not have memory available.
...
Detection
User alerted via Teams.
...
Why did we get a Connection pool shut down error?
There’re 1.7K similar errors (Connection pool shut down
) in TCS prod, and it started on Nov 4, 2024 3:39:51 PM UTC.
When looking forward on the logs, on2024-11-04 15:36:19.736
, we got:Code Block 2024-11-04 15:36:19.634 WARN 1 --- [ XNIO-2 task-16] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: S1000 2024-11-04 15:36:19.639 ERROR 1 --- [ XNIO-2 task-16] o.h.engine.jdbc.spi.SqlExceptionHelper : 0 2024-11-04 15:36:19.736 ERROR 1 --- [ XNIO-2 task-16] o.s.orm.jpa.EntityManagerFactoryUtils : Failed to release JPA EntityManager org.hibernate.exception.GenericJDBCException: Unable to release JDBC Connection ... 2024-11-04 15:36:19.749 ERROR 1 --- [ XNIO-2 task-16] o.s.t.i.TransactionInterceptor : Application exception overridden by rollback exception java.lang.OutOfMemoryError: Java heap space 2024-11-04 15:36:19.837 ERROR 1 --- [ XNIO-2 task-16] c.t.h.t.t.s.e.ExceptionTranslator : Could not roll back JPA transaction; nested exception is org.hibernate.TransactionException: Unable to rollback against JDBC Connection org.springframework.transaction.TransactionSystemException: Could not roll back JPA transaction; nested exception is org.hibernate.TransactionException: Unable to rollback against JDBC Connection
And then, we got a lot of
com.netflix.hystrix.exception.HystrixRuntimeException: GET_USER_PROFILE timed-out and no fallback available.
, following the errors below:Code Block 2024-11-04 15:39:51.760 ERROR 1 --- [strix-PROFILE-1] com.netflix.hystrix.AbstractCommand : Unrecoverable Error for HystrixCommand so will throw HystrixRuntimeException and not apply fallback. java.lang.Exception: Throwable caught while executing. Caused by: java.lang.OutOfMemoryError: Java heap space 2024-11-04 15:39:51.810 ERROR 1 --- [ XNIO-2 task-11] c.t.h.t.t.s.e.ExceptionTranslator : Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space 2024-11-04 15:39:51.830 WARN 1 --- [ XNIO-2 task-3] com.zaxxer.hikari.pool.ProxyConnection : HikariPool-1 - Connection com.mysql.cj.jdbc.ConnectionImpl@31a4a7c1 marked as broken because of SQLSTATE(08S01), ErrorCode(0) com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure Caused by: java.net.SocketException: Connection reset
It’s possible this was casued by the changes we did on how TCS talks to Reference service on Sep 19.
We searched for the previous logs, and noticed between Aug 01 to Sep 30, we had an OutOfMemory error on2024-08-27T12:30:52.496Z
. And then we searched for the logs between Feb and Jul, and noticed during this period, the first time we got OutOfMemory error was on 08/05/2024 14:40:40.070 UTC.
This live defect is very likely linked to the another one: 2024-10-08 Users experiencing slow loading and crashing . And we would like to monitor it and see if it still happens after the auto-scaling.
...