Thematic through my documentation and posts is to avoid using saved state or snapshots with domain controllers, although I’ve never explained why. There is also the minor special consideration of time synchronization.
Don’t Use Snapshots(Checkpoints) or Saved State
In a single domain controller environment, these tools will not cause issues. The negative effects are limited to active directory replication. Each Active Directory object is assigned an update sequence number (USN). When a domain controller wakes from Saved State or is rolled back to a snapshot, its USNs are out of sync with the rest of the domain. The other DCs not only know what the current update number of an object is, they also know what version other DCs are supposed to be holding. When these numbers don’t match up, one of two things can happen. If you’re lucky, a “USN Rollback” condition will be detected and the affected domain controller will stop participating in replication. That is not good, but it’s better than the alternative. If the domain controller had modified an object to the point that it had a higher USN than other DCs, but the object had also been modified elsewhere, then there will be a set of USNs that have the same number but represent different changes. This leaves Active Directory in an inconsistent state.
Recovering from a Saved/Snapshotted Domain Controller
There is a Microsoft article related to correcting a USN Rollback state. Essentially, you have to remove any affected DCs. If all of your DCs are affected, you will be running an Active Directory restore.
Steps to take for Virtualized Domain Controllers
- Disable automatic Save State for the virtual machine in both Hyper-V Manager and System Center Virtual Machine Manager. If your DCs are in High Availability mode on a Hyper-V cluster, you must also adjust their properties in Failover Cluster Manager.
Hyper-V Manager: Right-click the VM, go to Settings, and select the “Automatic Stop Action” tab. Set it to “Shut down the guest operating system”
SCVMM: Right-click the VM, go to Properties, and select the “Actions” tab. Set it to “Shut down guest OS.
Failover Cluster Manager: Expand the Services and Applications node and highlight the VM. In the center pane, right-click on the item under the “Virtual Machine” block and go to Properties, and select the Settings tab. Set the cluster-controlled offline action to “Shut Down”.
- Set the automatic start action accordingly in Hyper-V Manager and SCVMM. The steps very similar to those to set the automatic stop action, so you should have no trouble finding them. See the Deployment Strategy section for guidance on what to set.
- Disable Hyper-V Time Synchronization for all virtualized domain controllers in either Hyper-V Manager or SCVMM. This is not major, but the DCs will have trouble with time issues and that can affect the other machines in the domain.
Hyper-V Manager: In the VM’s Settings window, look on the “Integration Services” tab.
SCVMM: In the VM’s Properties window, look on the “Integration Services” sub-tab on the “Hardware Configuration” tab.
Any time you see a domain controller with a status of “Saved State”, discard the saved status prior to starting it. This is the virtual equivalent of pulling the plug on a live physical server and causing a dirty shut down. Active Directory is designed to handle this condition and no USN Rollback status will occur.
Deployment Strategy for Virtualized Domain Controllers
There is no “right way” to deploy domain controllers. I prefer to avoid using physical domain controllers because they aren’t very resource-intensive in most installations and as such, can be waste of hardware. Loading physical domain controllers down with non-DC roles to address under-utilization is generally a Bad Thing. I prefer to avoid using the Hyper-V hosts as DCs primarily because I want to install Hyper-V directly to the hardware, and it won’t run the DC role. Using a full installation of Windows Server as the host just to run the DC role is not the most efficient usage and, by holding two vital roles on one machine, goes right back to the Bad Thing problem.
If using all virtualized DCs, there is talk of the potential for a chicken and egg problem. If the domain controllers aren’t started, the Hyper-V hosts can’t log in. If the Hyper-V hosts aren’t logged in… so what? By default, the Hyper-V services run in the context of LocalSystem; they shouldn’t NEED to log in to start the DCs. It is true that the computer account and any user accounts won’t log in until the DCs start, but you’d have to work pretty hard to cause a real chicken and egg issue. Cached credentials should take care of most of the problem, and hopefully you have no need to change any settings that would cause an inability to contact the domain from letting Hyper-V Services start.
The following setup will avoid virtually any problem you are likely to encounter and allow you to fully virtualize your domain controllers. It requires a total of number of hosts + 1 Windows Server licenses for domain controllers.
- Create a virtual machine on each Hyper-V host and place it on internal storage, not in High Availability mode. This way, if the host is ever unable to access the shared storage, presumably because that device has failed, you’ll still have at least one active domain controller. Set this domain controller’s Automatic Start action to always start when the host is started.
- Create a highly available VM as a domain controller. Assign all 5 FSMO roles to it. Set its automatic start action to never turn on.
- Ensure you are regularly backing up the System State of at least the DC from step 2.
This setup will all but guarantee that as long as one of your hosts survives, you will have basic connectivity. The only thing it cannot cover for is a human being or an improperly configured automatic process making the error of saving the state of the DC in step 2 and then turning it on. Without human intervention, it is possible that the machine might go through a Save State in the event of something like a Maintenance Mode command, but by not allowing it to be automatically turned on, a human can catch this and discard the saved state. For other failures, there is a chance that a host-bound domain controller may be in a USN Rollback state, but it will still be able to operate as a read-only DC for basic connectivity purposes and if any other DC survives in a non-USN Rollback condition, then the affected DC can be safely removed from the domain and recreated. By keeping the FSMO roles on the DC that should realistically never have a restored Save State, you avoid the problems of needing to seize them. If that DC is ever lost for some reason, you should still have backups available from step 3 to prevent it from being a catastrophic loss.