Eric's Technical Outlet

Learning the hard way so you don't have to

Review of Altaro Backup for Hyper-V 3.0

Update: This review is for version 3. I have posted a review of version 5.

Older article from 2012:

Altaro Software has released the third major edition of their backup software for Hyper-V. I previously reviewed version 2 and will be making some references to that.

Short version: much improved over the previous edition and is maturing nicely, but still has a ways to go. At this time, I would say that it could be used in all small environments and most medium environments.

Pros

  • Inexpensive
  • ReverseDelta technology
  • Commits MS SQL and Exchange databases within guest VMs
  • Easy to use
  • Restore VMs as a clone or to a different host
  • Free version available; allows backup of two VMs
  • Agentless

Cons

  • Installs only to a Hyper-V host; no remote console
  • Disk targets only
  • Agentless
  • Too optimistic at times; too pessimistic at others
  • Reporting needs a lot of work

Installation

As before, the installer doesn’t check to see if the Hyper-V role is active, so if you don’t already know that it must be installed on a Hyper-V host, it still won’t tell you. When I installed it, it told me that .Net wasn’t installed; I know this is false, but I’m pretty sure that this determination is made by a Microsoft routine so I can’t blame Altaro for that. What was weird is that, at the end of installing .Net, it wanted to reboot. I clicked the button to allow the reboot, and Windows reported that “Altaro Setup” was blocking the reboot. I’ve done some work with WiX before so I know that the .Net installer runs in a separate process, but it seems like this part could be better coordinated. I had to cancel the Altaro installation so I could get the server to reboot. I had to manually restart the installation after the server came back up. It completed from that point without incident.

In this edition, you only have to install on one node in a cluster. At first run, it will detect other nodes in the cluster and offer to push the slave installation for you. This went through without incident for me. It does push the entire server installation to all the other nodes, but this is not a bad thing. If you connect to one of them and try to run the Altaro console, it will indicate which node holds the master role and allow you to transfer the master role to the node you’re on. I didn’t try this out.

Microsoft Shadow Copy Component Error 4104

All of my initial backups failed with “Backup failed due to an error in the Microsoft Shadow Copy component. (4104)”. I did a little searching, and all that is publicly available regarding Altaro for Hyper-V regards backing up a Small Business Server virtual machine, which I am not doing. I did find a few other links that got me on the right path. To make a long story short, I discovered that something had disabled “automount” on my hosts. I’m pretty sure I know what it was; I had installed another backup program to test, and I believe that it set it at install but neglected to reset it at uninstall. Anyway, the fix is remarkably simple. At the Hyper-V host, run “DISKPART”. At its prompt, run “AUTOMOUNT”. If it says it’s disabled, that’s your problem. Run “AUTOMOUNT ENABLED” and then exit out of DISKPART.

Improvements Over Version 2

The primary change is that cluster support is dramatically improved. You set up scheduling and VM selection in one pane for an entire cluster and Altaro just deals with it. VMs within a schedule are run according to the CSVs that they live on so that it doesn’t skip around with “Redirected Access” mode. It can now back up non-VSS aware clients (Linux, Windows XP) without taking them offline. It has a built-in “Backup Drive Swap Rotation” feature which is fairly self-explanatory. The scheduler has received some attention.

The Good, the Bad, and the Ugly

The GUI

Most of the interface is very straightforward, and if you accidentally do things out of order, it will get you to where you need to go. You really shouldn’t have to spend a lot of time in the help manuals to work with this software. However, the language used is often inconsistent and some things are just plain clumsy. For instance, I remarked in the previous column about how the program gives you a button to “Select a Backup Drive” when you’ve already done it. That’s still there. However, on the “File Level Restore” area, you are presented with a button to “Start File Level Restore”. Based on the “Select a Backup Drive” experience as well as common convention, it follows that this button should be used to confirm your selections and start the file level restore process. That’s not at all what it does. It kicks off the process of making selections and setting options. There are a couple of other similar examples. Of course, it’s still fairly obvious what to do in each location and it won’t take anyone more than a single try to figure out what’s going on, so this is a very nit-picky criticism.

There are a couple of GUI behaviors that quickly become annoying. It is, to coin a term, “confirmy”. It seems like every time you issue a command, the software has a pathological need to throw up a dialog to let you know that it did as you asked. This very quickly leads to out-of-hand dismissal of dialogs that might have actually been important, but I wouldn’t know since I stopped reading them after the 50th. In all such cases, my suggestion to the developer is to only tell me when the program did not do as I asked. The other quirk is when you change something, don’t save, and try to close or move to another part of the application. It stops you to warn you, which is nice, but if you choose to cancel the context-switch, you still have to hit the save button before moving on. It would make a lot more sense if the buttons were more like, “Yes I’m sure, forget the changes”, “Save the changes and go where I told you”, and “Leave me here because I didn’t intend to go anywhere” or something.

The scheduler is pretty slick. You define schedules and plunk VMs into them. For instance, you design a “Weekdays at 5 PM” schedule and then drag in the VMs you want to run at 5 PM on weekdays. It figures out what sequence to back up the individual VMs in. I didn’t test overlapping runs, but it appears that’s what you’d have to do if you wanted to give it a prioritization hint. I’d like a little more control here, but overall it’s nice the way it works.

Initial setup of the retention settings are really nothing short of agonizing if you have more than a few VMs. We’ve elected to use Altaro for seventeen VMs, and you must set up retention policies on a per-VM basis. That means seventeen dips into the UI and seventeen OKs to its proud proclamation that it has set the desired retention policy. We put all seventeen on the same basic schedule, at least as far as when the retention clean-up should run, so if that turns out to be a bad idea we’re going to have to go back in again… and wow do we really not want to have to do that. I’ll revisit retention as a feature.

The reporting module is really the Achilles’ heel of the GUI. There’s really no detail whatsoever, and there are some pretty blatant problems. When you access the “Backup History” portion, it generally marks all of them as successful, even when they weren’t; this behavior is not consistent. I had some backups that failed outright; they have red X’s as icons and the text is grayed out, but the “Result” column shows “Success”. I have e-mail notifications turned on so I know there’s an error, but if I delete that e-mail or don’t receive it or don’t turn it on, then the GUI will be of no help at all. I have another entry with a yellow exclamation icon that also shows success, but I didn’t get an e-mail on that one so I don’t know what happened. One of them has a red exclamation and does show an error in the result column, but it asks me to check the list of warnings; what list of warnings? Where? If I right-click on a successful item, I can get it to show a list of changed files. This is nearly useless. I can see which VHDs were changed and if the configuration files for the VM were changed, but I have no idea why I would ever want to know this. If I want to know what files were changed, what I’m most curious about is which files within the VM were changed. I’d like to see this part scrapped and replaced with a VM-level detailed history.

The “Backup Errors” section seems a little redundant. It doesn’t show me any more than what I can get from the History section, and it deletes itself when the next backup of that VM runs. I don’t know exactly when an error occurred, how far it got into the process, troubleshooting hints, or much of anything. It does have a bug though; when you access that section, the date stamp on the error is when you opened the console, not when the error actually occurred.

The Dashboard portion is kind of nifty, but alas, needs work. The two default charts work just fine, but most of the others are broken on my implementation. For instance, there’s a pie chart display for “Show Backup Size / VM”. I guess I have too many VMs, because this is what I see:

image

Where’s the pie? Some of the other charts don’t display anything at all. Not really critical, because backups work, but this segment could use a little attention.

Cluster/CSV

The enhanced cluster support is very nice, and is what actually tipped us over into becoming paying customers, although it’s still not what I want. The most important thing is that Altaro does work with Hyper-V clusters and CSVs whereas most Hyper-V backup programs do not. You can even LiveMigrate a machine during a backup and it will keep going. What I don’t like is that I have to run the console from a Hyper-V host. This means that I can’t delegate backup management to a junior administrator or help desk support. If a user wants to restore a single file from the file server, it has to be escalated to infrastructure-level staff. Another problem with running a console right on the Hyper-V host is a matter of resource contention. I’d much rather have the data for a backup be sent to a dedicated backup server for processing and not have to share CPU cycles with my customer order-entry software. This behavior really needs to be changed. It seems to me that if the console is smart enough to run from just one of the hosts, then it shouldn’t be horribly difficult to have it run from another server altogether.

Support

I haven’t personally contacted their support staff, but I lurk their forums and it seems as though they get very high marks from customers. The software practically begged me to contact them a couple of times. I personally would prefer that the software nudge me into finding the solution myself before telling me to call support, but in this day and age, a responsive, helpful support staff is a shining beacon in utter darkness.

Retention

Retention clean-up runs on a schedule completely separate from backup, the “why” of which is sort of a mystery to me. The most logical time to run a retention clean-up is at the tail-end of a successful backup job, so why can’t I just pick that and have the backup scheduler kick off a retention clean-up when it’s done backing up? The other problem with retention is that it is on a length-of-time basis and not a number-of-backups basis. So, if I have a VM on a one-week retention basis and the backup fails every day for a week, it appears that I’ll lose all my backups. I’d rather have it let me choose the number of backups to keep.

ReverseDelta

This is a feature that I really like, but I believe that it needs some tweaking, if possible. As mentioned in the previous article, a normal delta backup saves the original copy of a file on the first pass, then on each subsequent pass only saves the changes. ReverseDelta is appropriately named; it saves the latest copy in full and records the changes it would take to get back to the oldest backup. They claim that it speeds restores, because you typically want to a more recent copy as opposed to an older one. How often this is true is an unknown, but the premise is sound so I see no reason to challenge it. What I am not especially fond of in this feature is that you have to manage how it thinks. There’s a checkbox to keep a full copy of the file after a certain number of iterations. The problem with that is that I don’t really know what a good number is to put in there. I could give some ideas and examples, but still, there’s no guarantee. What I’d rather see is a percentage option. If more than, say 80% of a file has changed, then tracking the deltas probably consumes more space than just dropping in a full copy. Whether or not that takes 10 changes or 100 changes will depend completely upon the file in question. The other thing that isn’t clear is if it’s talking strictly about the VHD or about the individual files? My suspicion is that it means the VHD, which could mean that there’s probably not a lot of tweaking that can be done here.

Other Features

The “Fire Drill” feature is very nice. IT departments are supposed to periodically test to see if their backups are working, but most don’t. “Fire Drill” automates much of the process. I’d like to see this feature mature a little more, but just having it around at all is very nice.

The ability to directly restore an individual file from within a VHD is also very nice. In the current iteration, it’s somewhat limited because it can only dive into the last backup. I would like to see this feature mature.

Exclusions

I wouldn’t ordinarily give this its own section, but I made a really big deal out of it on the Veeam review so I’ll probably have to be really clear about this feature going forward. Altaro Backup for Hyper-V does not allow you to exclude any portion of a virtual machine for backup. Whether or not this is a big deal is up to you. What made it a big deal on Veeam was that they present an entire user interface section devoted to setting up exclusions, and 1) it doesn’t work, and 2) it doesn’t tell you that it doesn’t/didn’t work. Altaro doesn’t have the feature either, but they don’t pretend that they do and if any option in the software doesn’t work as intended, they at least took the time to wire up the error messaging system. As the product matures, some exclusion mechanism would be nice to have, but the reason I’m giving them a pass where I flunked out Veeam is because Altaro doesn’t tell me that something is going to work and then hide it from me when it doesn’t.

Problems

Backup of very large VMs can be problematic. I have one that is about 850GB in size because it contains about four years of document images. There is a lot of static data and very little data change. The “Data Backed Up / Day” reporting appears to be a little erratic here. Some days, it appears to be churning nearly 70% per day – in truth, it’s definitely less than 1%. However, this particular VM is the one that fails periodically with the “check the warnings” message, so it’s difficult to say what’s going on. No matter what, this VM has to be copied over to the backup destination in its entirety each day to have the comparison run, so this one VM takes up about 2 hours each day and needs 850GB of available storage at all times, but has never backed up more than 12GB in one run. In our case, we’re going to solve this by using an in-VM backup tool. I’m going to go out on a limb here and say that I suspect this has more to do with the nature of backing up a VHD and not a limitation within the Altaro software, although I could be wrong. Edge cases like this are where things tend to break down, so there’s a limit to how critical I can be. The more normal size VMs (under 100GB total) are being handled just fine.

As mentioned in the synopsis, the program is very optimistic at times and very pessimistic at others. It assumes everything will work, and when it doesn’t, it just sort of panics. In my last review, I commented that the retries seemed a little weird because it schedules them but doesn’t tell you when. It doesn’t look like it retries anymore; it just bombs out. On one day, I had ten of the seventeen VMs fail their backups and send me the following message: “Altaro Hyper-V Backup lost communication to the node on which this machine is running, or the node was restarted before the backup of the VM completed. (4096)”. I know that node didn’t reboot. I wasn’t awake when the disconnect occurred so I don’t know what happened on the network. What I do know is that disconnects are going to happen in any TCP/IP network. When I came in the following morning, I kicked off all those backups by hand and they all went through just fine. To address this going forward, we have moved all critical VMs to the master Altaro node just to be sure they hit their schedule on time. I want my retries back, though.

Wish List

The product is maturing nicely and Altaro is taking a very active role in listening to customers and moving forward with development. So, to wrap up this review, I present my Wish List for version 4 (or, seeing as how demanding I am, 5… or 6).

  • Install the console anywhere. I want it on my desktop. Actually, I want it on the desktops of my help desk staff so they can get back user files and on my junior administrators desktops so they can manage servers. So, it also appears I also want some security granularity over who can do what. All Active-Directory integrated, of course. This would naturally be expanded into a one-to-many ratio of Altaro backup consoles to Altaro-protected hosts/clusters.
  • Install the backup server component on a server other than the Hyper-V host. I want it to sit on a Windows Server that I built just for this purpose that handles the connection to the backup NAS target. I want my console that I installed on my desktop to talk to that, and in turn it should talk to a little agent that sits on the Hyper-V hosts. If your other customers want to install their consoles and control servers on top of their Hyper-V hosts, let them. I suppose I’m asking to split the software into three pieces: the agent on the Hyper-V host, the server component on the backup server machine, and the console that sits wherever I want.
  • Redesign the dashboard. Let me have nearly full-screen graphs, and make them scrollable if there’s not enough screen real estate.
  • Redesign the entire reporting module. You really only need one section, but it should contain all sorts of drill-down details.
  • Distinguish between recoverable errors and unrecoverable errors; let the unrecoverables bomb (i.e. 4104’s) and let the recoverables retry (i.e. 4096). Let me have a GUI control to let it know how many times I want it to retry before giving up, and maybe even let me tinker with the inter-retry timer. Where possible, give me a hint on how to solve the problem.
  • Tapes! I want to be able to run an annual backup off to tape.
  • If I can’t have tapes, err, well, even if I can, I’d like the ability to take a one-off backup to the location of my choosing. Sort of like the backup rotation feature that was added in this version, only without the rotation. I’d like to go in, select some VMs, and tell them to backup to a network location (or a tape!) just that one time only, so no ReverseDeltas or anything.
Advertisements

2 responses to “Review of Altaro Backup for Hyper-V 3.0

  1. Marc Gregory May 31, 2013 at 10:53 am

    Excellent review. Testing the newest version 3.5.42.0. Any updates to your thoughts from last year? Thanks

    Like

    • Eric Siron May 31, 2013 at 11:28 am

      I actually haven’t had much time to look at it over the past year, in all honesty.
      4.0 is currently in beta and as I understand it, they would like to release it within a few weeks.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: