We're updating the issue view to help you get more done. 

Full recovery failed when using FIFO Group

Description

After adding the fix for the full backup recovery fails ~50% of the times.
In the attached log, I see the following:

2019-06-18 11:42:37,989 sync-pu-1.0.0-SNAPSHOT.1 [2] FINE [com.gigaspaces.replication.replica] - Replication [sync-space_container1_1:sync-space]: starting space synchronization replica process using Url jini://*/sync-space_container1/sync-space?groups=yuval-pc&ignoreValidation=true&total_members=1,1&cluster_schema=partitioned-sync2backup&backup_id=1&id=1&schema=default&locators=yuval-pc&state=started&timeout=5000

And eventualy this message:

2019-06-18 11:44:38,100 sync-pu-1.0.0-SNAPSHOT.1 [2] WARNING [com.gigaspaces.space.engine.sync-space.1_1] - Recovery operation failed: com.gigaspaces.internal.cluster.node.impl.replica.ReplicaNoProgressException: No progress in replica stage for the past 60000 milliseconds; Caused by: com.gigaspaces.internal.cluster.node.impl.replica.ReplicaNoProgressException: No progress in replica stage for the past 60000 milliseconds

 

Then it seems to try again:

2019-06-18 11:46:39,466 sync-pu-1.0.0-SNAPSHOT.1 [2] INFO [com.gigaspaces.space.active-election.sync-space.1_1] - Space instance [sync-space_container1_1:sync-space] has been elected as Backup
2019-06-18 11:46:39,472 sync-pu-1.0.0-SNAPSHOT.1 [2] INFO [com.gigaspaces.core.common] - Space [sync-space_container1_1:sync-space] trying to perform recovery from [jini://*/sync-space_container1/sync-space?groups=yuval-pc&ignoreValidation=true&total_members=1,1&cluster_schema=partitioned-sync2backup&backup_id=1&id=1&schema=default&locators=yuval-pc&state=started]. RecoveryChunkSize=200

 

But failed with:

2019-06-18 11:46:39,695 sync-pu-1.0.0-SNAPSHOT.1 [2] WARNING [com.gigaspaces.space.engine.sync-space.1_1] - Recovery operation failed: java.util.ConcurrentModificationException; Caused by: java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1207)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1261)
at java.util.AbstractCollection.toString(AbstractCollection.java:461)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)

The full log is attached.

 

** Step the reproduce using the attached test **
1. Build sync pu by calling `mvn clean install`.
2. Start a xap cluster with one GSC. Set`-Dcom.gigaspaces.grid.gsc.serviceLimit=1` for the gsc.
3. Deploy sync-pu-1.0.0.-SNAPSHOT.jar. Wait until PLEASE START BACKUP INSTANCE is logged.
Start another GSC in the cluster. Backup PUI should get started immediatly and does a full resync.
4. Check logs for `Instantiated sync-pu-1.0.0-SNAPSHOT.1 [2] in ...` do get timing for full resync.

 

This bug should be fixed for 12.3.1-patch2 & 14.5

Workaround

None

Acceptance Test

None

Status

Assignee

Yael Nahon

Reporter

Yuval Dori

Labels

None

Priority

Critical

SalesForce Case ID

None

Commitment Version/s

None

Due date

None

Product

None

Edition

Open Source

Platform

All