In this series of blog articles I will share with you six steps that will help make your business failure tolerant. — See also: The Complete Series
If you made sure that your website is permanently online, you should also provide redundant systems for your office and back office IT.
For years we have been using n+1 redundancy on all our IT operations. n+1 means that we always purchase and run each piece of hardware at least twice. If we need one server to do something, we always put the same server right next to it so we are able to switch at any time.
This may sound crazy at the first glance, but it has many benefits:
This makes even more sense when you combine this approach with virtualization which we use heavily at Paessler. For our daily operations we run several VMware clusters. Each set of hypervisors is always designed to allow for one full hypervisor node to go down without affecting our business at all. Also software upgrades of the hypervisors cause no downtime.
Our office is based in Nuremberg/Germany (blog post Paessler is Moving-Again!). Inside our office we have a data center with two racks of mostly non-critical software testing systems and our VoIP phone system along with one of our domain controllers. The rest of our IT-stuff, especially the mission critical systems, is packed into colocation racks in a professional data center.
Again we have applied redundancy in many ways:
Since all servers are virtualized the Auto-Healing is effectively performed by VMotion. As soon as one of the hypervisors should crash, all the virtual machines are moved to the remaining ones.
Here at Paessler most employees are not actually working on their personal PCs any more. When they log in in the morning they start a Remote Desktop session and connect to one of our Terminal Servers (which are again virtualized servers on one of our VMware clusters). They are effectively working on the terminal servers and do not store any data on their desktops. Most desktops are thin clients with Atom-CPU (which keeps down energy usage, too). And still everybody at Paessler has two 24-inch screens on his desk for a comfortable work environment. Even most of our developers are using code editors/compilers/IDEs on virtualized machines plus a second VM as permanent test system.
So the data we work with is always stored on our highly redundant hardware in a professional data center, we don't even bother with backups of the desktop PCs. If a desktop PC breaks we call DELL to fix it on the next business day. The employee simply moves over to another desk and opens his previous remote desktop session and keeps on working without losing anything. A nice side effect is that we also have the same work environment when we log into our VPN from our home offices.
In essence the Auto-Healing for office PCs means that you move over to a free desk in the office to continue working.
The next blog post will give insight into good backup solutions and disaster recovery plans.
At Paessler we have been selling software online for 15 years and we have had hardware, software, and network failures just as everybody else. We tried to learn from each one of them and we tried to change our setup so that each failure would never happen again.
Read the other posts of this series: