Fixed
Details
Details
Assignee
EvgenyF
EvgenyF(Deactivated)Reporter
Yuval Dori
Yuval Dori(Deactivated)Participants of an issue
EvgenyF
Yuval Dori
Priority
SalesForce Case ID
00011037
Fix versions
Edition
Open Source
Platform
All
Acceptance Test
added test: RelocateProcessingUnitInstanceTest
Freshdesk Support
Open Freshdesk Support
Created March 23, 2017 at 9:38 AM
Updated February 8, 2018 at 11:34 AM
Resolved November 16, 2017 at 9:14 AM
The following reproduces at the customer site once in ~100 retries.
Client:
After the full deployment, the customer started a rebalancing script. This script uses the Admin and invokes public ProcessingUnitInstance relocateAndWait(GridServiceContainer gridServiceContainerToRelocateTo, long timeout, TimeUnit timeUnit) method. The result of this invocation was the client blocks forever.
Thread dumps are taken 18h after the method call, it is still in a deadlock.
Thread which calls relocationAndWait
"pool-20-thread-1" #547 prio=5 os_prio=0 tid=0x00000000200f2000 nid=0x20a18 runnable [0x0000000026bdd000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(Redefined)
locked <0x00000000ca17ad90> (a java.lang.Object)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:239)
at com.gigaspaces.lrmi.nio.Reader.readBytesBlocking(Reader.java:628)
at com.gigaspaces.lrmi.nio.Reader.bytesToPacket(Reader.java:566)
at com.gigaspaces.lrmi.nio.Reader.readReply(Reader.java:158)
at com.gigaspaces.lrmi.nio.CPeer.invoke(CPeer.java:711)
at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:79)
at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:76)
at com.gigaspaces.lrmi.DynamicSmartStub.invokeRemote(DynamicSmartStub.java:449)
at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:429)
at com.gigaspaces.reflect.$GSProxy7.relocate(Unknown Source)
at com.gigaspaces.grid.gsm.GSMProxy.relocate(GSMProxy.java:247)
at org.openspaces.admin.internal.gsm.DefaultGridServiceManager.relocate(DefaultGridServiceManager.java:366)
at org.openspaces.admin.internal.pu.DefaultProcessingUnitInstance.relocate(DefaultProcessingUnitInstance.java:435)
at org.openspaces.admin.internal.pu.DefaultProcessingUnitInstance.relocateAndWait(DefaultProcessingUnitInstance.java:462)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.lambda$null$3(RelocationOperation.java:85)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation$$Lambda$589/791011469.apply(Unknown Source)
at java.util.Optional.map(Optional.java:215)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.lambda$relocationAction$4(RelocationOperation.java:85)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation$$Lambda$588/1007713063.apply(Unknown Source)
at java.util.Optional.flatMap(Optional.java:241)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.relocationAction(RelocationOperation.java:83)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.lambda$executeAction$2(RelocationOperation.java:75)
at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation$$Lambda$580/793351903.accept(Unknown Source)
at com.frequentis.ps.maintenance.service.simulation.operation.RebalanceOperation.rebalanceAction(RebalanceOperation.java:123)
at com.frequentis.ps.maintenance.service.simulation.operation.RebalanceOperation.lambda$executeAction$8(RebalanceOperation.java:109)
at com.frequentis.ps.maintenance.service.simulation.operation.RebalanceOperation$$Lambda$577/508898348.accept(Unknown Source)
at com.frequentis.ps.maintenance.service.rebalance.Rebalancing.performRebalancing(Rebalancing.java:194)
at com.frequentis.ps.maintenance.service.rebalance.Rebalancing.rebalance(Rebalancing.java:156)
at com.frequentis.ps.maintenance.service.rebalance.RebalanceService.lambda$rebalance$0(RebalanceService.java:117)
at com.frequentis.ps.maintenance.service.rebalance.RebalanceService$$Lambda$264/1768124127.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
GSM Thread which handles the relocation call
"GS-LRMI Connection-pool-1-thread-15" #124 daemon prio=5 os_prio=0 tid=0x0000000016426000 nid=0x1b46c runnable [0x0000000022dfc000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(Redefined)
locked <0x00000000ec070680> (a java.lang.Object)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:239)
at com.gigaspaces.lrmi.nio.Reader.readBytesBlocking(Reader.java:628)
at com.gigaspaces.lrmi.nio.Reader.bytesToPacket(Reader.java:566)
at com.gigaspaces.lrmi.nio.Reader.readReply(Reader.java:158)
at com.gigaspaces.lrmi.nio.CPeer.invoke(CPeer.java:711)
at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:79)
at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:76)
at com.gigaspaces.lrmi.DynamicSmartStub.invokeRemote(DynamicSmartStub.java:449)
at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:429)
at com.gigaspaces.reflect.$GSProxy12.destroy(Unknown Source)
at org.jini.rio.resources.servicecore.ServiceAdminProxy.destroy(ServiceAdminProxy.java:229)
at org.jini.rio.monitor.ServiceElementManager.doDestroy(ServiceElementManager.java:1034)
at org.jini.rio.monitor.ServiceElementManager.destroyService(ServiceElementManager.java:996)
at org.jini.rio.monitor.ServiceElementManager.decrement(ServiceElementManager.java:1521)
locked <0x00000000eabca480> (a java.lang.Object)
at org.jini.rio.monitor.ProvisionMonitorImpl$OpStringManager.decrement(ProvisionMonitorImpl.java:2648)
at org.jini.rio.core.OperationalStringManagerGigaspacesMethodinternalInvoke16.internalInvoke(Unknown Source)
at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:45)
at com.gigaspaces.lrmi.DynamicSmartStub._invoke(DynamicSmartStub.java:276)
at com.gigaspaces.lrmi.DynamicSmartStub.invokeDirect(DynamicSmartStub.java:347)
at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:432)
at com.gigaspaces.reflect.$GSProxy10.decrement(Unknown Source)
at org.jini.rio.monitor.ServiceElementManager.relocate(ServiceElementManager.java:1121)
at org.jini.rio.monitor.ProvisionMonitorImpl$OpStringManager.relocate(ProvisionMonitorImpl.java:2477)
at org.jini.rio.core.OperationalStringManagerGigaspacesMethodinternalInvoke14.internalInvoke(Unknown Source)
at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:45)
at com.gigaspaces.lrmi.DynamicSmartStub._invoke(DynamicSmartStub.java:276)
at com.gigaspaces.lrmi.DynamicSmartStub.invokeDirect(DynamicSmartStub.java:347)
at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:432)
at com.gigaspaces.reflect.$GSProxy10.relocate(Unknown Source)
at com.gigaspaces.grid.gsm.GSMImpl.relocate(GSMImpl.java:384)
at com.gigaspaces.grid.gsm.GSMGigaspacesMethodinternalInvoke10.internalInvoke(Unknown Source)
at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:45)
at com.gigaspaces.lrmi.LRMIRuntime.invoked(LRMIRuntime.java:441)
at com.gigaspaces.lrmi.nio.Pivot.consumeAndHandleRequest(Pivot.java:494)
at com.gigaspaces.lrmi.nio.Pivot.handleRequest(Pivot.java:572)
at com.gigaspaces.lrmi.nio.Pivot$ChannelEntryTask.run(Pivot.java:173)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Expectation is that after 60 seconds a get a timeout exception or just return with null, but not a block forever.
Server:
We can see that the timeout that the client pass is being used just after the relocate (DefaultProcessingUnitInstance line 471). This seems like a second problem. It is okay that the server will try to relocate even after the client timeout but is seems that for the server there is no timeout at all.