Recently, I was checking an interesting issue for one of our customers. They are experiencing random packet dropping where users accessing their applications running on NSX environment are suffering from intermittent disconnections.
They are running NSX-V 6.4 where their applications are connected to the logical switches behind the DLR and two perimeter edges running in Equal Cost Multi-Pathing (ECMP) mode providing South-North traffic to their workloads. The topology is similar to the one below:
In this post, I will focus on two features that need your attention when configuring your edges in ECMP mode to avoid such packet dropping issues.
Disable ECMP-Edge firewall
Edge firewall is a stateful service which means it performs stateful packet inspection and tracks the state of network connections. This may drop asymmetric traffic resulting from the multiple data paths available via the ECMP edges. Firewall need to be disabled for ECMP to operate correctly.
So, the first rule of thumb here, is to disable edge firewall on ECMP edges.
Disable Reverse Path Filtering
This was the cause of our issue.
In NSX Edge, Reverse Path Forwarding (RPF) is enabled by default.
When RPF is enabled, the Edge only forward packets if they are received on the same interface that would be used to forward the traffic to the source of the packet. If the route to the source address of the packet is through a different interface than the one it is received on, the packet is dropped.
For more information you can check the below VMware KB article:
https://kb.vmware.com/s/article/2127073
So, second rule of thumb is to disable RPF on all Edges participating in an asymmetric routing environment.
To disable RPF via GUI:
To disable RPF via REST API, make the below API call to the NSX manager:
PUT https://NSX_mgr_IP/api/4.0/edges/<edge-ID>/systemcontrol/config
<systemControl>
<property>sysctl.net.ipv4.conf.all.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_0.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_1.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_2.rp_filter=0</property>
<property>sysctl.net.ipv4.conf.vNic_3.rp_filter=0</property>
</systemControl>
In the command output, 0’s mean disabled and 1’s mean enabled.
To check RPF drop packet count:
This command shows you the number of packets being dropped by RPF if you are experiencing such an issue.
Conclusion
To avoid any packet drop when you are running ECMP with asymmetric routing, always disable reverse path filtering (RPF) and firewall on your NSX edges.
Hope this post is informative,
Thank you for reading,
Mohamad Alhussein
Hello Mohamad,
Thanks you for the great post. Do you have any idea of the impact to do these changes ? What is the behaviour after this change ? because i have the samle issue and i need quantify the impact before changing.
Thanks you for your help.
Best regards,
Hoa HUYNH
Hello Hoa,
Glad that you liked the post and it was informative. Reverse Path Filter helps in fighting malicious traffic within DC fabric, so obviously disabling RPF to solve packet loss comes with a drawback from security perspective.
Regards,
Mohamad
Hello Mohamad,
Hoa means if the disable of Firewall or RPF in production environemment can cause packet loss or issue… ?
Regards
KARIM
Hi Karim,
RPF is a security feature. When enabled it will simply discard the packet and not route in an instance Edge finds Source of the packet can be reached via X interface but received the packet from Y interface for routing. This security feature capability limits the appearance of spoofed addresses on a network. Disabling firewall is a pre-requisite for ECMP-enabled NSX edges “https://docs.vmware.com/en/VMware-Validated-Design/4.1/com.vmware.vvd-sddc-consolidated-deploy.doc/GUID-267FEDCD-4D16-4BAE-9602-031947F0A9A6.html” and it will not cause any issue as it is a requirement and a recommendation from VMware. RPF is enabled by default to gain from the security benefits of this feature. However if RPF is causing any packet loss it can be disabled without causing any packet loss or any side effect beside losing that security benefit.
Best Regards,
Mohamad