Best practise and pitfalls to avoid for vCenter SFTP file-based backups
- Mark
- 2 days ago
- 3 min read
We all know backing up of your vCenter's using the built in file based backup is critical for recovery. I just wanted to share a couple of important pitfalls i came across recently in case it helps anyone else that comes across this.
Do not backup Multiple vCenters t the same start time to the same directory
Following the recent addition of some new vCenters, we started to see an unusual increase in backup failures for the associated file-based backup. In this particular environment there were 4 vCenter's and an SDDC Manager backing up to the same SFTP Server. All 5 x appliances were configured to backup to the same directory at exactly the same start time
My first thought was the SFTP server was probably running low on space, and the intermittent failures were due to fluctuating free space, but this was quickly ruled out. My second thought was this could be an issue with the number of concurrent backups being written to the SFTP server, but that was ruled out also. We then enabled debug logging on the Linux based SFTP Server and discovered something very interesting.
When multiple vCenters are configured to backup to the same directory, they create their own subdirectories for separation but will share the same top level directory for temp files used during the backup job. For example: if vCenter1 starts its backup job at 23:59, it creates a timestamped directory and cfg file in the top level directory (eg backup_20260325-2359/fbbr_write235912.cfg). If vCenter2 then starts backing up to the same path at the same time (and most importantly in the same second) it begins referencing the same file vCenter1 is using. vCenter1 then deletes the file, causing vCenter2’s backup job to fail. The more vCenter’s that are present the more likely the File Collision issue will occur. Prior to the increase in vCenters, we would see the odd failure, but not enough to be a concern. It was only when the number of vCenters was increased from 2 to 4 that we noticed the obvious increase in failures, leading us to delve deeper.
Extract from SFTP Server debug log
Here's an extract from the actual debug log (anonymised). At the time of a failed backup, we can 2 different vCenters opening a session in the same second, one vCenter then deletes a tmp file that the other vCenter is still referencing. The second vCenter also then tries to delete the file, but fails because it no longer exists, resulting in the No such file error, and ultimately a failed backup.
Mar 25 23:59:14 sftp-server.acme.com sftp-server[88136]: session opened for local user vmware_backup from [10.0.0.80]
Mar 25 23:59:14 sftp-server.acme.com sftp-server[88136]: remove name "/home/vmware_backup/vCenter/backup_20260325-2359/fbbr_write235912.cfg"
Mar 25 23:59:14 sftp-server.acme.com sftp-server[88139]: session opened for local user vmware_backup from [10.0.0.7]
Mar 25 23:59:14 sftp-server.acme.com sftp-server[88139]: remove name "/home/vmware_backup/vCenter/backup_20260325-2359/fbbr_write235912.cfg"
Mar 25 23:59:14 sftp-server.acme.com sftp-server[88139]: sent status No such file
Remediation: I couldn't find this particular issue referenced in the official VMware documentation but to avoid this issue from occurring, the guidance here is to use dedicated subdirectory paths for each vCenter backup to enforce separation. This allows multiple backups to run at the same time without error. Alternatively use 1 minute intervals to separate start times, whilst ensuring Issue 2 (below) is also adhered to.
All vCenters (and SDDC Manager if VCF) in a linked mode environment must be scheduled to start their backup within a 5 minute window
Whilst troubleshooting the issue above, i came across a few examples in customer environments where the above guidance wasn't being followed. This is important for recovery situations as not following this guidance increases the risk of a failure to recovery a working environment.
Broadcom guidance states “You must configure the backup jobs for SDDC Manager and all vCenter instances to start within the same 5-minute window”
Again i couldn't find this explicitly referenced in the official vSphere docs but it is referenced in the official VCF docs here VCF 5.2 Backup Docs and here VCF 9.0 Backup Docs
Risks: If you deviate from this guidance - it is possible in a DR scenario that SDDC Manager and all vCenter’s in a Linked Mode configuration may need to be restored. If recovery points between these components differs by more than the 5 minute window, it increases the risk of a failure to recover from a DR situation
Conclusion
So whilst vCenter backups are typically fairly straightforward, most of us usually set them up and forget about them. It is only when you find yourself in a DR situation you may find you have incomplete backups or worse still, a set of backups that are from different points in time and not properly synchronized.
Hopefully this guidance helps others that may be experiencing similar issues!
Comments