Failed to remove Org VCD

Submitted by Bart Oevering on Tue, 02/06/2024 - 10:47
 
 
Follow your favourite author

Leave us your email address and be the first to receive a notification when Bart posts a new blog.

Failed to remove Org VCD
Tue 06 Feb, 2024
Since last year I have been busy with VMware Cloud Director (VCD) and the new versions show great collaborations with VMware NSX, more about that in future blogs. But I think I may have discovered a bug in this new version, so let's get into it!
Textarea

VMware Cloud Director

This is my first blog about VMware Cloud Director. A VCD introduction blog was my first intention to write, until I discovered this bug and wanted to share it right away. Basic knowledge of VCD is required for this blog.

Some background

On November 30th, version 10.5.1 of VCD was released (VMware Cloud Director 10.5.1 is now GA) and I wasted no time upgrading my LAB VCD to this new version. Since I only use my VCD in test situations, without a real Org VDC, the easiest way was to just scrap the old VM and deploy a new one. That's exactly what I did.

All worked well and I started to play around with the "NSX Tenancy" setting. This setting provides the translation from VCD organizations into VMware NSX Projects, allowing to easily separate tenants into their own 'project' space. The project limits the scope of items shown and allows for a more cloud provider-like setup. A dedicated 'Short log identifier' is also set for all syslog messages from the named organization. This allows you to split it into a dedicated syslog collector for that tenant/project.

The error

Since I'm only using this VCD for testing, I wanted to reproduce what I just built and try it again. My experience is if something works straight out of the box, it either works very well or it works by chance and often breaks the second time 😁. So I tried to delete the Org VDC I created. VCD was able to remove most network components, but not the Virtual Datacenter.

Code (new)

Operation: Deleted Virtual Datacenter 020_Rattmann(0d790692-d362-42db-8223-f005a717df53) Type: vdc Status: Failed Organization: System Service Namespace: com.vmware.vcloud Details: [ 7faf1e0f-775f-4576-bd0b-a878b49c7c90 ] Internal Server Error - Bad Request: The object path=[/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3] cannot be deleted as either it has children or it is being referenced by other objects path=[/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3/infra/domains/default/gateway-policies/79790f77-e28c-4305-8746-f366b29465e6], error code 500030 Debug Information: com.vmware.vcloud.api.presentation.service.InternalServerErrorException: Internal Server Error at com.vmware.vcloud.api.presentation.service.impl.VdcServiceAdapterImpl.waitForFuture(VdcServiceAdapterImpl.java:2621)

Image
VCD deleted failed
Textarea

Start the investigation

The error shows us that the item cannot be deleted because it is still in use. Well, that's not so nice! First let's check what else is configured for that Org VDC. That’s strange, I couldn't find any configuration items still in use. I looked at the error again and saw that it comes directly from NSX, as I recognize the API path (/orgs/default/projects//infra/domains/....). My second research focused on NSX, but I still couldn't find anything in use there either.

NSX shows that there are some default rules in the Distributed Firewall, but they simply cannot be removed. The API call used shows that a project still had something to do with 'gateway policies', but there is no T1 Gateway linked with the project and the gateway firewall is empty. Why is this error displayed?

Image
NSX Networking
Image
NSX DFW
Image
NSX GW FW
Textarea

On to the API then! I configured my Postman* as Rutger Blom shows in his blog, except I didn't load the OpenAPI specs, so keep in mind that when you see {{baseURL}} it means the FQDN of my NSX manager .
*I use Postman because I don't know of any other API client that isn't curl or wget and is a bit more sophisticated.

The first request I made was a GET of that API path:

Code (new)

https://{{baseUrl}}/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3/infra/domains/default/gateway-policies/79790f77-e28c-4305-8746-f366b29465e6

Textarea

This will produce a strange result as the output will be titled “VMware NSX | Login”.

Image
Postman get gateway policies
Textarea

I then dug a little deeper and discovered that the error is not the full path of the request being made (you need to prefix https://{{baseUrl}}/policy/api/v1/ yourself). So a second attempt, with the following API call:

Code (new)

GET https://{{baseURL}}/policy/api/v1/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3/infra/domains/default/gateway-policies/79790f77-e28c-4305-8746-f366b29465e6

Textarea

This does show a result and indeed returns a Gateway Policy that VCD apparently cannot or forgot to remove.

Code (new)

{
"rules": [],
"resource_type": "GatewayPolicy",
"id": "79790f77-e28c-4305-8746-f366b29465e6",
"display_name": "79790f77-e28c-4305-8746-f366b29465e6",
"tags": [
{ "scope": "SYSTEM", "tag": "urn:vcloud:org:06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3" },
{ "scope": "SYSTEM", "tag": "urn:vcloud:vdc:0d790692-d362-42db-8223-f005a717df53" },
{ "scope": "SYSTEM", "tag": "urn:vcloud:gateway:79790f77-e28c-4305-8746-f366b29465e6" }
],
"path": "/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3/infra/domains/default/gateway-policies/79790f77-e28c-4305-8746-f366b29465e6",
"relative_path": "79790f77-e28c-4305-8746-f366b29465e6",
"parent_path": "/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3/infra/domains/default",
"remote_path": "",
"unique_id": "733b34c7-7dbc-4f35-aa68-e571d4d43e2b",
"realization_id": "733b34c7-7dbc-4f35-aa68-e571d4d43e2b",
"owner_id": "a3a5e538-1a5d-451c-8725-b3cf3b12eb8c",
"marked_for_delete": false,
"overridden": false,
"sequence_number": 0,
"internal_sequence_number": 54000000,
"category": "LocalGatewayRules",
"stateful": true,
"tcp_strict": true,
"locked": false,
"lock_modified_time": 0,
"rule_count": 0,
"is_default": false,
"_create_time": 1702465593549,
"_create_user": "sa_vcd_admin",
"_last_modified_time": 1702465593549,
"_last_modified_user": "sa_vcd_admin",
"_system_owned": false,
"_protection": "NOT_PROTECTED",
"_revision": 0
}

Image
Postman GET local gateway rules
Textarea

The fix

The solution is actually very simple. Since we are already using the API to verify if there is a logical construct preventing the NSX project from being deleted, we can try using DELETE instead of GET with that exact Gateway policy.

Code (new)

DELETE https://{{baseUrl}}/policy/api/v1/orgs/default/projects/06035aea-d0e9-4f1c-b5aa-ec8df0ada9c3/infra/domains/default/gateway-policies/79790f77-e28c-4305-8746-f366b29465e6

Image
Postman DELETE gateway policy
Textarea

Would it now be possible to remove the Org VDC? Yes! The task completed successfully.

Image
VCD DELETE succeeded
Textarea

I have also tested removing an Org VDC that is not using NSX Tenancy, and it perfectly works every time. So, the impact only affects organizations and Org VDC that have NSX Tenancy enabled.

Closing thoughts

Thank you for reading! Hopefully you found this interesting and maybe even learned something new!

Do you have any questions or just want to leave a comment? Feel free to do so, I'm always curious.

Tags

Questions, Remarks & Comments

If you have any questions and need more clarification, we are more than happy to dig deeper. Any comments are also appreciated. You can either post it online or send it directly to the author, it’s your choice.
Let us know  

 
 
Questions, Remarks & Comments

Message Bart directly, in order to receive a quick response.

More about RedLogic