Uploaded image for project: 'InsightEdge Platform'
  1. GS-13204

The ProcessingUnitInstance.relocationAndWait() may not abort after timeout, causing a permanent blocking state

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 12.3
    • Labels:
      None
    • Platform:
      All
    • SalesForce Case ID:
      00011037
    • Acceptance Test:
      added test: RelocateProcessingUnitInstanceTest
    • Sprint:
    • Edition:
      Open Source

      Description

      The following reproduces at the customer site once in ~100 retries.

      Client:
      After the full deployment, the customer started a rebalancing script. This script uses the Admin and invokes public ProcessingUnitInstance relocateAndWait(GridServiceContainer gridServiceContainerToRelocateTo, long timeout, TimeUnit timeUnit) method. The result of this invocation was the client blocks forever.

      Thread dumps are taken 18h after the method call, it is still in a deadlock.

      Thread which calls relocationAndWait
      "pool-20-thread-1" #547 prio=5 os_prio=0 tid=0x00000000200f2000 nid=0x20a18 runnable [0x0000000026bdd000]
      java.lang.Thread.State: RUNNABLE
      at sun.nio.ch.SocketDispatcher.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      at sun.nio.ch.IOUtil.read(IOUtil.java:192)
      at sun.nio.ch.SocketChannelImpl.read(Redefined)
      - locked <0x00000000ca17ad90> (a java.lang.Object)
      at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:239)
      at com.gigaspaces.lrmi.nio.Reader.readBytesBlocking(Reader.java:628)
      at com.gigaspaces.lrmi.nio.Reader.bytesToPacket(Reader.java:566)
      at com.gigaspaces.lrmi.nio.Reader.readReply(Reader.java:158)
      at com.gigaspaces.lrmi.nio.CPeer.invoke(CPeer.java:711)
      at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:79)
      at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:76)
      at com.gigaspaces.lrmi.DynamicSmartStub.invokeRemote(DynamicSmartStub.java:449)
      at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:429)
      at com.gigaspaces.reflect.$GSProxy7.relocate(Unknown Source)
      at com.gigaspaces.grid.gsm.GSMProxy.relocate(GSMProxy.java:247)
      at org.openspaces.admin.internal.gsm.DefaultGridServiceManager.relocate(DefaultGridServiceManager.java:366)
      at org.openspaces.admin.internal.pu.DefaultProcessingUnitInstance.relocate(DefaultProcessingUnitInstance.java:435)
      at org.openspaces.admin.internal.pu.DefaultProcessingUnitInstance.relocateAndWait(DefaultProcessingUnitInstance.java:462)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.lambda$null$3(RelocationOperation.java:85)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation$$Lambda$589/791011469.apply(Unknown Source)
      at java.util.Optional.map(Optional.java:215)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.lambda$relocationAction$4(RelocationOperation.java:85)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation$$Lambda$588/1007713063.apply(Unknown Source)
      at java.util.Optional.flatMap(Optional.java:241)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.relocationAction(RelocationOperation.java:83)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation.lambda$executeAction$2(RelocationOperation.java:75)
      at com.frequentis.ps.maintenance.service.simulation.operation.RelocationOperation$$Lambda$580/793351903.accept(Unknown Source)
      at com.frequentis.ps.maintenance.service.simulation.operation.RebalanceOperation.rebalanceAction(RebalanceOperation.java:123)
      at com.frequentis.ps.maintenance.service.simulation.operation.RebalanceOperation.lambda$executeAction$8(RebalanceOperation.java:109)
      at com.frequentis.ps.maintenance.service.simulation.operation.RebalanceOperation$$Lambda$577/508898348.accept(Unknown Source)
      at com.frequentis.ps.maintenance.service.rebalance.Rebalancing.performRebalancing(Rebalancing.java:194)
      at com.frequentis.ps.maintenance.service.rebalance.Rebalancing.rebalance(Rebalancing.java:156)
      at com.frequentis.ps.maintenance.service.rebalance.RebalanceService.lambda$rebalance$0(RebalanceService.java:117)
      at com.frequentis.ps.maintenance.service.rebalance.RebalanceService$$Lambda$264/1768124127.run(Unknown Source)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      GSM Thread which handles the relocation call
      "GS-LRMI Connection-pool-1-thread-15" #124 daemon prio=5 os_prio=0 tid=0x0000000016426000 nid=0x1b46c runnable [0x0000000022dfc000]
      java.lang.Thread.State: RUNNABLE
      at sun.nio.ch.SocketDispatcher.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      at sun.nio.ch.IOUtil.read(IOUtil.java:192)
      at sun.nio.ch.SocketChannelImpl.read(Redefined)
      - locked <0x00000000ec070680> (a java.lang.Object)
      at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:239)
      at com.gigaspaces.lrmi.nio.Reader.readBytesBlocking(Reader.java:628)
      at com.gigaspaces.lrmi.nio.Reader.bytesToPacket(Reader.java:566)
      at com.gigaspaces.lrmi.nio.Reader.readReply(Reader.java:158)
      at com.gigaspaces.lrmi.nio.CPeer.invoke(CPeer.java:711)
      at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:79)
      at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:76)
      at com.gigaspaces.lrmi.DynamicSmartStub.invokeRemote(DynamicSmartStub.java:449)
      at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:429)
      at com.gigaspaces.reflect.$GSProxy12.destroy(Unknown Source)
      at org.jini.rio.resources.servicecore.ServiceAdminProxy.destroy(ServiceAdminProxy.java:229)
      at org.jini.rio.monitor.ServiceElementManager.doDestroy(ServiceElementManager.java:1034)
      at org.jini.rio.monitor.ServiceElementManager.destroyService(ServiceElementManager.java:996)
      at org.jini.rio.monitor.ServiceElementManager.decrement(ServiceElementManager.java:1521)
      - locked <0x00000000eabca480> (a java.lang.Object)
      at org.jini.rio.monitor.ProvisionMonitorImpl$OpStringManager.decrement(ProvisionMonitorImpl.java:2648)
      at org.jini.rio.core.OperationalStringManagerGigaspacesMethodinternalInvoke16.internalInvoke(Unknown Source)
      at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:45)
      at com.gigaspaces.lrmi.DynamicSmartStub._invoke(DynamicSmartStub.java:276)
      at com.gigaspaces.lrmi.DynamicSmartStub.invokeDirect(DynamicSmartStub.java:347)
      at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:432)
      at com.gigaspaces.reflect.$GSProxy10.decrement(Unknown Source)
      at org.jini.rio.monitor.ServiceElementManager.relocate(ServiceElementManager.java:1121)
      at org.jini.rio.monitor.ProvisionMonitorImpl$OpStringManager.relocate(ProvisionMonitorImpl.java:2477)
      at org.jini.rio.core.OperationalStringManagerGigaspacesMethodinternalInvoke14.internalInvoke(Unknown Source)
      at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:45)
      at com.gigaspaces.lrmi.DynamicSmartStub._invoke(DynamicSmartStub.java:276)
      at com.gigaspaces.lrmi.DynamicSmartStub.invokeDirect(DynamicSmartStub.java:347)
      at com.gigaspaces.lrmi.DynamicSmartStub.invoke(DynamicSmartStub.java:432)
      at com.gigaspaces.reflect.$GSProxy10.relocate(Unknown Source)
      at com.gigaspaces.grid.gsm.GSMImpl.relocate(GSMImpl.java:384)
      at com.gigaspaces.grid.gsm.GSMGigaspacesMethodinternalInvoke10.internalInvoke(Unknown Source)
      at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:45)
      at com.gigaspaces.lrmi.LRMIRuntime.invoked(LRMIRuntime.java:441)
      at com.gigaspaces.lrmi.nio.Pivot.consumeAndHandleRequest(Pivot.java:494)
      at com.gigaspaces.lrmi.nio.Pivot.handleRequest(Pivot.java:572)
      at com.gigaspaces.lrmi.nio.Pivot$ChannelEntryTask.run(Pivot.java:173)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Expectation is that after 60 seconds a get a timeout exception or just return with null, but not a block forever.

      Server:
      We can see that the timeout that the client pass is being used just after the relocate (DefaultProcessingUnitInstance line 471). This seems like a second problem. It is okay that the server will try to relocate even after the client timeout but is seems that for the server there is no timeout at all.
       

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                evgenyf EvgenyF
                Reporter:
                Yuvald Yuval Dori (Inactive)
                Participants of an issue:
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: