8000
Skip to content

To add new gw cli namespace-location #1744

@leonidc

Description

@leonidc

Problem Description

In a stretched cluster configuration, both Namespaces and Gateways (GWs) are created with an associated location tag (for example, SiteA, SiteB).

When the location of a Gateway or a Namespace is modified, an automatic rebalancing process is triggered. This process is responsible for redistributing namespaces across ANA groups to maintain location-aware load balancing.

Rebalancing Limitation

A corner case arises when the last Gateway representing a specific location (e.g., SiteA) changes its location. In this scenario, the rebalancing process cannot select a suitable load-balancing (ANA) group for namespaces that are still tagged with the original location (SiteA). As a result:

These namespaces remain associated with their existing ANA group.
They effectively become homeless from a location-aware perspective.
The system state becomes non-obvious to the user.

Current CLI Limitation

The existing nvme-gw show CLI provides only aggregate namespace counts per Gateway, for example:

{
"gw-id": "654abc50dd67",
"anagrp-id": 3,
"location": "SiteB",
"admin-state": "ENABLED",
"num-namespaces": 18,
"performed-full-startup": 1,
"availability": "AVAILABLE",
"num-listeners": 3,
"ana-states": "1: WAIT_BLOCKLIST_CMPL, 2: STANDBY, 3: ACTIVE"
}

This output does not expose how namespaces are distributed by location, which makes it difficult to:

Diagnose rebalance failures

Understand why namespaces remain attached to a given Gateway

Explain why certain administrative operations fail

Proposed Enhancement

Each Gateway has sufficient internal knowledge of its namespaces and their location tags. Using this information, the Gateway should expose a location-to-namespace count map.

Example

If a Gateway with LB group 3 hosts 18 namespaces in total, distributed across two locations:

SiteA: 5 namespaces
SiteB: 13 namespaces

the new CLI will output for all LB groups:

LBGroup 1:
Native Location: SiteC
Namespaces:
Location number-namespaces
SiteC 15

LBGroup 2:
Native Location: SiteD
Namespaces:
Location number-namespaces
SiteD 15

LBGroup 3 :
Native Location SiteB (this is a location of the GW LB group owner)
Namespaces:
Location number-namespaces
SiteB 13
SiteA 5

Native location can be cached from the nvme-gw show command like it is done for other GW commands.
see the helper function defined in cephutils get_ana_grp_location(self):

Benefits

Makes rebalance anomalies immediately visible

Clearly exposes homeless namespaces

Additional Use Case: Gateway Deletion

Another problematic scenario occurs when a Gateway that owns an ANA group containing namespaces from multiple locations receives a Delete Gateway command.

Current Behavior

The Gateway cannot be deleted while it hosts namespaces.
The user receives a generic failure with limited diagnostic information.
The reason for the failure is not obvious.

Proposed Behavior

The new CLI output allows the system to clearly explain why deletion is blocked:
The Gateway still hosts namespaces associated with locations that have no alternative Gateways.

These namespaces must either: change location, or be deleted

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0