Configure amount of time to block on a read call from socket

Description

original title:
Replication hang for ever (sync connect) causing the entire application to freeze.


The customer has id generating mechanism logic, when replication of a transaction hangs it has severe consequences on the application,
SSL filter is used which causes the choice in sync and not async connect.
TCP keep-alive settings are as recommended in our docs.
The problem was resolved only after full restart.

example:
"DefaultTimeout5" #431 prio=5 os_prio=0 tid=0x00007f5294008000 nid=0x25d5 runnable [0x00007f50e5f6f000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

  • locked <0x0000000265b7d5d0> (a java.lang.Object)
    at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:239)
    at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.handleBlokingNeedUnwrap(IOBlockFilterContainer.java:404)
    at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.handleNeedUnwrap(IOBlockFilterContainer.java:387)
    at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.handshakeOneStep(IOBlockFilterContainer.java:339)
    at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.processHandshake(IOBlockFilterContainer.java:303)
    at com.gigaspaces.lrmi.nio.filters.IOBlockFilterContainer.beginHandshake(IOBlockFilterContainer.java:296)
    at com.gigaspaces.lrmi.nio.filters.IOBlockFilterManager.beginHandshake(IOBlockFilterManager.java:184)
    at com.gigaspaces.lrmi.nio.CPeer.connectSync(CPeer.java:342)
    - locked <0x00000006f0d05b10> (a com.gigaspaces.lrmi.nio.CPeer)
    at com.gigaspaces.lrmi.nio.CPeer.connect(CPeer.java:196)
    - locked <0x00000006f0d05b10> (a com.gigaspaces.lrmi.nio.CPeer)
    at com.gigaspaces.lrmi.ConnectionPool.getConnection(ConnectionPool.java:110)
    at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:58)
    at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:76)
    at com.gigaspaces.lrmi.DynamicSmartStub.invokeRemote(DynamicSmartStub.java:456)
    at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:436)
    at com.gigaspaces.reflect.$GSProxy12.dispatch(Unknown Source)
    at com.gigaspaces.internal.cluster.node.impl.router.AbstractProxyBasedReplicationMonitoredConnection.dispatch(AbstractProxyBasedReplicationMonitoredConnection.java:135)
    at com.gigaspaces.internal.cluster.node.impl.router.spacefinder.ConnectionReference.dispatch(ConnectionReference.java:59)
    at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceChannel.dispatchReplicationPacket(AbstractReplicationSourceChannel.java:693)
    at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceChannel.replicateAfterChannelFilter(AbstractReplicationSourceChannel.java:677)
    at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceChannel.replicate(AbstractReplicationSourceChannel.java:602)
    at com.gigaspaces.internal.cluster.node.impl.groups.sync.SyncReplicationSourceChannel.execute(SyncReplicationSourceChannel.java:210)
    at com.gigaspaces.internal.cluster.node.impl.groups.reliableasync.ReliableAsyncReplicationSourceGroup.executeImpl(ReliableAsyncReplicationSourceGroup.java:297)
    at com.gigaspaces.internal.cluster.node.impl.groups.AbstractReplicationSourceGroup.execute(AbstractReplicationSourceGroup.java:252)
    at com.gigaspaces.internal.cluster.node.impl.ReplicationNode.execute(ReplicationNode.java:253)
    at com.gigaspaces.internal.server.space.SpaceEngine.performReplication(SpaceEngine.java:6601)
    at com.gigaspaces.internal.server.space.SpaceEngine.replicateAndfreeCache(SpaceEngine.java:6659)
    at com.gigaspaces.internal.server.space.SpaceEngine.replicateAndfreeCacheContextTxn(SpaceEngine.java:6642)
    at com.gigaspaces.internal.server.space.SpaceEngine.prepare(SpaceEngine.java:3139)
    at com.gigaspaces.internal.server.space.SpaceEngine.prepareAndCommit(SpaceEngine.java:3188)

Full Thread dump and logs attached.

Workaround

None

Acceptance Test

xap-core

Assignee

Meron Avigdor

Reporter

Ester Atzmon

Labels

None

Priority

Critical

SalesForce Case ID

12987

Fix versions

Commitment Version/s

None

Due date

None

Product

None

Edition

Open Source

Platform

All
Configure