Get a demo
I hereby consent to the processing of my personal data specified herein by CROC, for the purposes and within the scope set forth by the Personal Data Protection legislation of the Russian Federation, in conjunction with the activities performed and for an indefinite term.
Solution of interest
Get a quote
I hereby consent to the processing of my personal data specified herein by CROC, for the purposes and within the scope set forth by the Personal Data Protection legislation of the Russian Federation, in conjunction with the activities performed and for an indefinite term.
Solution of interest
Try for free
I hereby consent to the processing of my personal data specified herein by CROC, for the purposes and within the scope set forth by the Personal Data Protection legislation of the Russian Federation, in conjunction with the activities performed and for an indefinite term.
Solution of interest
Stay tuned

How Provider Tried to Lock in Our Customer

24.05.2018 8 minutes 357

A rather short and funny story faced by our customer in real life. One day an IT infrastructure provider decided to move its data center, with a 3-day downtime to follow. All customers were notified six months in advance but due to all the provider’s fuss and red tape some of them couldn’t get prepared on time.  

Just imagine that you are a CIO with neither budget for a backup site, nor legacy equipment to use. A healthcare company you’re working in can't afford even an hour of downtime. One hour more and your customers will feel the pain, while 1-day downtime is to cause financial and reputational losses being equal to the company's annual revenue.

And the finishing shot: any move of yours is waiting for month-long approvals by the provider that is adding to the problem. Since you pay the provider so much, why let you go?

The provider's point

All is simple. The customer has been hosting its entire infrastructure for donkey's years, and if it survives the downtime, it will live in clover for many years ahead and surely will pay all this time. That's why the provider was striving to prevent customer churn by putting sand in its wheels right until the end of migration. The customer has nowhere to escape, doesn’t it? Let's provide some compensation for inconvenience under SLA or give a tiny discount.

The customer's point

The customer didn’t get the joke. Well, it did it right away and approached us asking to move the entire infrastructure as quickly as possible, while dealing with two sites to be created later when all that hurry-scurry is over.

We were tasked to export all the data and deploy all the same in our cloud based on two fault-tolerant data centers certified by the Uptime Institute — a 3-day routine to complete in case of no rare operating systems or dongles. OK, no rare OS, dongles, legacy data storages or resource-intensive servers were found. Just 5 TB of data and that’s all.

However, it was not as simple as it seemed to be.

Challenges

According to the SLA, the provider had to respond within three days, and as late as an hour before the deadline we received an Indian-helpdesk-like response asking for a tiny non-relevant detail to proceed. Then again two days and 23 hours of silence.

Internet speed was below 50 Mbps, with at least a half of the channel being used by the customer in the live production environment. So a tolerable speed was available in the dead of night only. The provider refused to increase the bandwidth, even for an extra charge.

There were only two weeks left. “Checkmate”, their admin might have thought. “Nuts to you”, we decided. 

Double-Take

There is such a thing called Carbonite (former Double-Take) and designed for asynchronous replication. It has a migration-focused solution Carbonite Move, with one-time licensing being valid until you migrate all the stuff.

Installed next to OS, the solution starts looking into the writing activity. The agent installed on a machine being copied has to push data to the channel and see if new data is written after the synchronization start. When the first data bulk is transferred to a target infrastructure, Carbonite Move will slow down all the software on an operating system level, identify the gap between the current data bulk and the bulk actually written to the target, and send new data appeared while the bulk was being copied.

 

To get a week-long difference, we had to survive just a 1-hour downtime, plus another hour to initiate a new infrastructure. So we planned to spend four hours but did it much faster. Four hours at night were quite acceptable, but not eight.

Why the week-long? Because it took us exactly the week to pull the data from agents via a tragically poor-speed channel.

Then, it was time to deploy and go live. After quick tests and one-day production experience, the customer stopped virtual machines on our opponent's site.

What a surprise for the provider! When we stopped flooding their admins with emails, and I bet they thought we gave up.

What was the software I mentioned?

Here is the link to the software page: https://www.carbonite.com/products/doubletake-migration-software/.

The home page claims near-zero downtime, and that is true when there are not so many changes made during the copying, meaning you either have a pretty wide bandwidth or can migrate systems one by one. None of that was our case.

 

You can use this software to test new sites before going live into production (the license permits it) or employ another solution in the suite to set up Active-Passive replication, and that's it.

 

The solution is easy to manage via the console:



Everything is copied, with no connectors needed. Double-Take simply takes everything sector by sector (or, to be more precise, LUN by LUN), identifying successful writes and their locations. The channel is secured during the migration; nobody can see what is really being written.

 

What couldn’t be migrated

Not all the stuff can be migrated using the above scenario. There was a loaded virtual machine (almost 100% CPU load) with a hot database running on it. Once installed, the agent started consuming the capacity and thus impacting the database operation. At night, the database load was a bit less but the database was huge and the window was too small. We realized that night-long copying would take too much time not allowing us to complete the migration before the deadline.

So we made a backup copy of the VM, physically brought it to our site on the drive, and uploaded it to the cloud. Then we deployed the VM from the backup copy and installed the missing updates using native MS SQL utilities. That's it! Seems an easy thing to do but we’ve got to do it quickly and on the first try.

I wish you the same luck in your migrations!

Don't miss the most important, interesting and helpful posts of the week

Success

More stories