In one of my VSAN design engagements, we have VSAN 6.6 ESXi nodes with 512GB RAM while ESXi is installed on internal SD-Cards. The customer is planning to scale up RAM to 1TB per host, and to scale over with 3 additional hosts in the future.
What is a coredump?
During a host failure with a purple diagnostic screen, VMware ESXi attempts to save diagnostic information to one or more pre-configured locations. Coredumps help VMware support debug and analyze the issue to find the root cause of PSOD.
According to the below KB article, if ESXi is installed on SD-card with host RAM > 512GB, you have to consider the following:
https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-4B738A10-4506-4D70-8339-28D8C8331A15.html
– Redirect syslog messages to a syslog server.
– Redirect core dumps to an external coredump collector.
The question is, how do you size the coredump partition per VSAN-enabled ESXi host to know the storage resources needed for the dump collector?
The below KB article explains well how do you size your coredumps:
https://kb.vmware.com/s/article/2147881
Core dump sizing is based on the below factors:
– Number of disk groups
– Amount of DRAM per host
– Size of cache-tier SSD disk
General Guidelines
Without vSAN enabled:
For every 1 TB of DRAM there should be a coredump size partition of 2.5 GB
With vSAN enabled:
In addition to the core dump size , the physical size of the size of caching tier SSD(s) in GB will be used as the basis of calculation the additional core dump size requirements.
Base requirement for vSAN is 3.981GB
For every 100GB cache tier, 0.181GB of space is required
Every disk group needs a base requirement of 1.32 GB
Data will be compressed by 75%
General Formula
RequirementOnSSDSize = (((size of SSD in GB)/100 GB) * 0.181) + 1.32
Requirement = base + (requirementOnSSDSize1 + requirementOnSSDSize2 + requirementOnSSDSize3 …)
SizeOfCoredumpBasedOnDG = requirement * 0.25
SizeOfCoredumpBasedOnDRAM = 2.56 GB * size of DRAM in TB
Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM
Example
I will take my design example to illustrate how the sizing can be done.
We have the below in our environment:
– 9 ESXi hosts – with a plan to scale over to 12 hosts in the future.
-512GB RAM per host – with a pla to scale up to 1TB RAM in the future.
-1 Disk Group per host
-800GB SSD disk acting as cache-tier per host.
Below is my calculation:
– Calculate cache tier (requirementOnSSDSize = (((size of SSD in GB)/100 GB) * 0.181) + 1.32)
– RequirementOnSSDSize = (((800)/100 GB) * 0.181) + 1.32 = 2.768GB
– Add the base overhead (requirement = base + requirementOnSSDSize). Requirement = 3.981+ 2.768 = 6.749GB
– Apply compression (sizeOfCoredumpBasedOnDG = requirement * 0.25). SizeOfCoredumpBasedOnDG = 6.749 * 0.25 = 1.687GB
– Add DRAM overhead:
-sizeOfCoredumpBasedOnDRAM = 2.56GB * 0.5 TB
-Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM
-CoreDump = 1.687GB + 1.28GB
-CoreDump = 2.967GB ~ 3GB
Let’s plan for scale up to 1TB DRAM per host as a future growth, you end up with coredump = (1.687GB + 2.56TB) = 4.247GB ~ 5GB per ESXi host. Total coredumps = 9*5 = 45GB coredumps.
Plan again for scale over to 12 hosts as a future growth, you end up with coredump = 12*5 = 60GB
So you know how to size coredumps for VSAN environment now.
Hope this post is informative,
Mohamad Alhussein