I have been working on a VSAN design and deploy engagement for one of our customers, and we went up having some limitations and decisions that have to be taken into consideration especially with the device selection for ESXi boot, the storage controller, and the operating mode of that controller (RAID vs. Passthrough).
I thought that writing such a post can help others being in the same situation.
To start, I will list the available hardware that we have:
– DELL PowerEdge R730 Servers
– PERC R730 RAID Controllers
– Dual 16GB Sd-Card modules for IDSDM
– 8 Flash disks per server participating in All-Flash VSAN disk group
Consideration 01 – ESXi Boot Device
You can configure IDSDM from Bios in mirrored mode and install ESXi on SD-Card. See logs and coredumps redirection requirement in “Consideration 02” for this option.
You can install ESXi on RAID1 disks. However you have to take into consideration the restrictions mentioned in the below KB article about mixing VSAN disks with other VMFS volumes: https://kb.vmware.com/s/article/2136374
You can think of another controller per server acting in RAID mode for RAID1 ESXi disks in addition to the PERC R730 controller acting in HBA mode for the VSAN disks.
Consideration 02 – Logs and Coredumps
If you install ESXi on SD-Card, make sure you redirect logs to a syslog server (such as vRealize log insight) which is outside VSAN environment and coredumps to a coredump collector outside VSAN as well.
If you plan to use locally attached disks to store logs or coredumps, then consider these limitations:
– First of all, you are wasting more disk slots per server as less slots will be available for using more VSAN disks and diskgroups in the future If you are planning to scale up in terms of storage.
– Second, It is not supported to have VSAN disks co-exist with Non-VSAN VMFS volumes on this controller as per the below KB article:
https://kb.vmware.com/s/article/2136374
If you are planning to install ESXi on locally attached RAID1 disks to store logs/coredumps locally, again there are limitations and restrictions as per the KB article: https://kb.vmware.com/s/article/2136374
Consideration 03 – Storage Controller Mode
The PERC R730 RAID Controller can operate in either one of two modes:
– RAID mode
– HBA mode
You can’t have two disks mirrored in RAID1 for ESXi partition and another disks participating in VSAN in passthrough mode at the same time.
You can operate PERC R730 controller in HBA (Pass-through) mode for VSAN disks leaving ESXi installation on SD-Card modules.
If you configure PERC R730 controller in RAID mode for the sake of having ESXi installed on RAID1 locally-attached disks, you have to check the KB article https://kb.vmware.com/s/article/2136374 for the restrictions of having VSAN disks with Non-VSAN VMFS volumes on this controller.
RAID0 vs. Passthrough
Virtual SAN supports storage controllers in two modes, either pass-through or RAID0 mode. One of the big considerations when choosing a storage controller for Virtual SAN is whether it supports pass-through mode, RAID0 mode, or both.
RAID0
– RAID0 mode is utilized by creating a single drive RAID0 set via the storage controller software, and then presenting this to Virtual SAN.
– When utilizing RAID0 mode, the storage controller cache should be disabled (this is configurable on some, but not all storage controllers) to ensure the storage controller cache does not conflict with the cache of the SSD drives which is controlled by Virtual SAN.
– RAID0 mode will require interaction with the storage controller software to manage the addition and removal of drives. If you want to add a disk or replace a faulty one, you have to reboot the host and add the new disk to the RAID0 volume to be able to add it to the disk group. This means no hot-replace capability.
– If you are to use RAID 0, a custom claim rule may be required to set an SSD drive with the SSD flag.
Pass-Through
– Individual disks can be hot-replaced without the need to enter them into some RAID configuration. One thing to keep in mind. The disk cannot have any partition information.
The pass-through mode allows the hypervisor to have greater control over the drives in the capacity tier group.
– Virtual SAN has complete control of the local SSDs and HDDs attached to the storage controller.
VSAN recommendation is to use pass-through mode whenever possible.
H730 Vs. HBA330 Storage Controllers
What I would advise is configuring hosts with HBA330 controllers which are better suited for vSAN (H730 were designed for RAID not passthrough) and thus more reliable, to the point that Dells official recommendation is to not use H730P for vSAN and use HBA330 instead:
https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=vsanio&productid=34853
“While this controller will continue to be supported on vSAN for the full service life, DellEMC strongly recommends using the HBA330 pass-through controller for vSAN deployments. The HBA330 is specifically designed for Software Defined Storage applications like vSAN, and does not introduce the unnecessary RAID processing overhead involved with the H730 PERC RAID family of controllers.”
Mohamad Alhussein
Good read, thanks. One thing about the lack of hot-swap functionality on the H730 in RAID-mode. The H730 allows realtime RAID configuration via RACADM, iDRAC webGUI, and (I’d guess?) OMSA. Now, I’m not too familiar with vSAN… Is there some other reason that you still have to reboot the host?
Thank you Rob, Yeah if you manage to configure it online then you can directly add disk to the VSAN disk group without a host reboot.
The issue here is the manual interaction needed from the administrator to get this to work compared to pass-through mode which allows easy hot-swap functionality in case of disk failure without any extra intervention.
This was useful thanks. Consideration number 3 above is what was impacting me as well as one other super strange problem that I worked through (won’t mention here). After switching H730 Mini controller to HBA-mode I had fun with ESXi installer complaining about duplicate UUID because I had split a boot mirror. I used Dell’s feature of erasing the SSDs via BIOS setup menu and then once I went into VC and removed all partitions on all SSDs, then after re-installing ESXi, vSAN configuration worked ok. vSAN configuration wizard wasn’t smart enough to erase partitions so I had to go do that beforehand. As a person with a background in storage, I find the vSAN configuration wizard odd & clunky. At least I have it setup now for testing another product’s interaction with vSAN. Even though odd & clunky, it’s definitely easier than setting up a real SAN, which I have done hundreds of times as well.
Why is vSAN asking me for cache vs capacity tier when all I have is SSDs in the hosts? Oh well, question I don’t actually need the answer to since I’m done with what I set out to accomplish.
Thanks Michael for writing your troubleshooting experience :). Glad that this post helped you mate.
Hello Mohamad,
Good article!
Unfortunately at the end it does not work for me with VMWare 7.0.
Hardware Dell R730, removed Raid H730 controller and use HBA330 latest firmware.
Boot without disks: HBA 330 can be set to passthrough.
BUT:
When I insert one disk (bootable-disk or single-data-disk) the HBA330 cannot be set to passthrough in vSphere Client -> Host -> Configure -> Hardware -> PCI Devices. Allways the same “An error occurred during host configuration.
Failed to unbind devices: 0000:02:00.0. Configuration will be applied after reboot.” But even after reboot the controller is not set tu passthrough. I have to remove all disks: then it works immediatly. – But makes no sense, as there is only a controller and no disk…
Any idea, what I could do – or is there an issue with VMWare 7.0 and PCI-passthrough?