After a network reconnect a pending instance might be discarded and not monitored for liveness

Description

In the scenario observed the fault-detection detected an unhealthy service upon network reconnect, and the service was placed in the pending queue.

If the instance was not destroyed properly - e.g. unreachable, connection exception, connection timeout, etc. the GSM might get a service added from LUSs reconnecting back to the network.

If the service added event is received and the GSM hasn't yet removed the serviceBeanInstance completely, then the addition will be ignored and also a fault-detection handler will not be set.

This can lead to a situation where a service is no longer pending, its instance is still alive, but in stopped/unhealthy state, and the manager is not monitoring it via the FDH. The result is a missing active instance.

Workaround

None

Acceptance Test

manager,service-grid,disconnect suites

Assignee

Meron Avigdor

Reporter

Meron Avigdor

Labels

None

Priority

Medium

SalesForce Case ID

None

Fix versions

Commitment Version/s

None

Due date

None

Product

XAP

Edition

Enterprise

Platform

All
Configure