Friday, February 25, 2011

Where's your data? – Google and Microsoft Data centers

Are you worried about where you keep your money? – Answer is obviously “Yes”. In the modern information society, data is equally or more important when compared with money. Not to mention that data is referred here to the information gathered at the computers. But how many of you know about where your data is stored. We know about data stored in our computer, or in our backup device. But what about the data stored in internet.

Today most people use internet based email services, picture albums, video hosting facilities, document management systems…. and lot more. Technically, considerable amount of our digital information is out there on the internet. We use Gmail, yahoo mail, live mail for our email purposes. And flicker, Picasa for sharing our pictures. YouTube for videos and Google Docs, Office online for various documents. But the locations of the storage and what happen behind is a complete mystery for lot of us.

All those data used and gathered by internet services and applications are stored at huge data centers of those service providers (if not, at third-party datacenters). Following is a quick look at the datacenters used by two information giants; Google and Microsoft.

Google, what happen there?

Google data centers are responsible for hosting data of their vast number internet services. Starting from Google search, Gmail, Google docs, Picasa, YouTube, Blogger, maps… list still grow on. According to the information from Google (as they officially declared in their web site), they have seven major data centers. Two of them are scheduled to be operational within this year. 

Google’s Data center in Oregon

But according the data center knowledge Google isn’t revealing all their data centers (“…but many of its older data center locations remain under wraps” – data center knowledge). According to them there are more than 30 data centers (may not be in the same scale) around the globe. Some of them are just hosting the country-specific versions of the Google search engine.

Google is not much interested in revealing information. They believe that will be a threat to the security of their user’s data as well as secrecy of their data center operations give it a competitive advantage. So they wouldn’t tell much about it. They are not only worrying about the data,  but also concerned about details such as the size and power usage which can be valuable for their competitors. Google is said to have the most efficient infrastructure as their data centers consume only half the energy of a typical data center.
…we don’t reveal every detail of what goes on at our facilities, or where every data center is located. For one thing, we invest a lot of resources into making our data centers the fastest and most efficient in the world, and we’re keen to protect that investment. - Google
Anyway even though Google doesn’t say much  about the exact location, a map is available which display all the data centers mentioned by the data center knowledge.

Though Google try to hide about data centers, you can read about what happen inside those data centers. That is, the process or the mechanism they use to store data is available.  Read this paper by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung who designed and implemented the Google File System.
Google File System (GFS), is a scalable distributed file system ideally designed for large distributed data-intensive applications which are used by large number of clients. It concerns more about fault tolerance and the availability on the inexpensive hardware. GFS offers a file system span over a huge storage (hundreds of terabytes) over thousands of machines and thousands of disks enabling to store all their information in one of most sophisticated facilities.

Microsoft way...

Just like Google, Microsoft is not much interested in talking the law level technical details about their data centers. However, it’s not a secret anymore that they are pretty much enthusiast about improving their data centers as they have big hopes for the ‘cloud’.

Microsoft isn't thinking the locations will mean any importance to user. “In most cases, data center location has no effect on users.” – Microsoft. So far there are some known sites like Chicago, Dublin, San Antonio, Quincy-Washington and northern Virginia which have Microsoft data centers with each having a space of 500,000 square feet.

Microsoft Data center at Chicago

However according to the interview of Kevin Timmons, (general manager of data center services at Microsoft) given to ‘gigaom’; Microsoft is envisioning setting up tiny data centers in countries where those need to ensure the information stored stays within the country’s geographic boundaries. Therefore we must look in to a future where there may be tiny data centers along with large datacenters.

Ina Fried at CNet news, comes up with information about the plans for new datacenters by Microsoft. According to that Microsoft datacenters will shipped in as a pre-manufactured unit. Therefore only the concrete building to protect the data centers, and the cooling system are required to built at the site. This is a new approach to the datacenter installation. 

Microsoft has several independent data centers located in different parts of the world.  As they said these data centers are not backups for each other. They are separate and work in collaboration with other datacenters. When consider about the inner workings, each of them maintains its own ‘Active Directory directory service’, its own ‘Microsoft Online Services Administration Center’ and ‘My Company Portal’, and its own resource management and other tools.

Here's is a good video by Microsoft showing the integration and the inner workings of a datacenter.

No comments:

Post a Comment

Had to enable word verification due to number of spam comments received. Sorry for the inconvenience caused.