original title:
Replication hang for ever (sync connect) causing the entire application to freeze.
The customer has id generating mechanism logic, when replication of a transaction hangs it has severe consequences on the application,
SSL filter is used which causes the choice in sync and not async connect.
TCP keep-alive settings are as recommended in our docs.
The problem was resolved only after full restart.
example:
"DefaultTimeout5" #431 prio=5 os_prio=0 tid=0x00007f5294008000 nid=0x25d5 runnable [0x00007f50e5f6f000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
locked <0x0000000265b7d5d0> (a java.lang.Object)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:239)
at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.handleBlokingNeedUnwrap(IOBlockFilterContainer.java:404)
at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.handleNeedUnwrap(IOBlockFilterContainer.java:387)
at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.handshakeOneStep(IOBlockFilterContainer.java:339)
at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.processHandshake(IOBlockFilterContainer.java:303)
at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.beginHandshake(IOBlockFilterContainer.java:296)
at com.gigaspaces.lrmi.nio.filters.IOBlockFilterManager.beginHandshake(IOBlockFilterManager.java:184)
at com.gigaspaces.lrmi.nio.CPeer.connectSync(CPeer.java:342)
- locked <0x00000006f0d05b10> (a com.gigaspaces.lrmi.nio.CPeer)
at com.gigaspaces.lrmi.nio.CPeer.connect(CPeer.java:196)
- locked <0x00000006f0d05b10> (a com.gigaspaces.lrmi.nio.CPeer)
at com.gigaspaces.lrmi.ConnectionPool.getConnection(ConnectionPool.java:110)
at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:58)
at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:76)
at com.gigaspaces.lrmi.DynamicSmartStub.invokeRemote(DynamicSmartStub.java:456)
at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:436)
at com.gigaspaces.reflect.$GSProxy12.dispatch(Unknown Source)
at com.gigaspaces.internal.cluster.node.impl.router.AbstractProxyBasedReplicationMonitoredConnection.dispatch(AbstractProxyBasedReplicationMonitoredConnection.java:135)
at com.gigaspaces.internal.cluster.node.impl.router.spacefinder.ConnectionReference.dispatch(ConnectionReference.java:59)
at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceChannel.dispatchReplicationPacket(AbstractReplicationSourceChannel.java:693)
at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceChannel.replicateAfterChannelFilter(AbstractReplicationSourceChannel.java:677)
at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceChannel.replicate(AbstractReplicationSourceChannel.java:602)
at com.gigaspaces.internal.cluster.node.impl.groups.sync.SyncReplicationSourceChannel.execute(SyncReplicationSourceChannel.java:210)
at com.gigaspaces.internal.cluster.node.impl.groups.reliableasync.ReliableAsyncReplicationSourceGroup.executeImpl(ReliableAsyncReplicationSourceGroup.java:297)
at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceGroup.execute(AbstractReplicationSourceGroup.java:252)
at com.gigaspaces.internal.cluster.node.impl.ReplicationNode.execute(ReplicationNode.java:253)
at com.gigaspaces.internal.server.space.SpaceEngine.performReplication(SpaceEngine.java:6601)
at com.gigaspaces.internal.server.space.SpaceEngine.replicateAndfreeCache(SpaceEngine.java:6659)
at com.gigaspaces.internal.server.space.SpaceEngine.replicateAndfreeCacheContextTxn(SpaceEngine.java:6642)
at com.gigaspaces.internal.server.space.SpaceEngine.prepare(SpaceEngine.java:3139)
at com.gigaspaces.internal.server.space.SpaceEngine.prepareAndCommit(SpaceEngine.java:3188)
Full Thread dump and logs attached.
Thread dump was taken at 15:43:31 customer knows that "Space based id generator" hanged for 10 minutes till full restart.
This thread executes a transaction in average each 10 - 15 seconds. According to the database, the last transaction on the relevant table before the freeze was executed at 15:31:55. Direct presitence is used.