Wednesday, December 3, 2014

Cloud Computing and Big Data - Two Peas in a Pod?

There have been a multitude of blogs and articles about Cloud Computing and Big Data but usually as individual topics. As the use of cloud computing increases, the processing of Big Data becomes more prominent. Why is that?


 


Figure 1 Ruggedized Server Cloud-In-A-Case System Figure 1
Ruggedized Server Cloud-In-A-Case System


Cloud computing is the latest buzzword for the future of computing. But it is really not a new term. The first acknowledged use of “cloud computing” has been traced to 1996 and Compaq Computer. Their vision was detailed and timely. Not only would all business software move to the web, but what they termed “cloud computing-enabled applications” like consumer file storage would become common (Technology Review 10/11). Of course in 1996 network technology and computer technology was not at the point of being able to implement a lot of the ideas of cloud computing.


 


Fast forward ten years and Amazon.com introduced the Elastic Compute Cloud (EC2), three words that describe expandable computing power located in some other space. In 2006 Amazon released a beta version of EC2 to the public on a first come, first served basis. EC2 went into production in October 2008 when the beta label was removed and EC2 was offered as a supported service by Amazon. Now Amazon cloud and web services are used by not only commercial enterprises but also by government and military users.


 


Said Jeff Bezos at an Amazon shareholder meeting “We’re really focused on what we call infrastructure Web services…Amazon Web Services is focused is on very deep infrastructure. It has the potential to be as big as our retail business. It’s a very large area and right now (and) it’s done in our opinion in a very inefficient way. Whenever something big is done inefficiently that creates an opportunity.”


 


Big data is another term that has a long history. There are several references to the term Big Data but the one that is most referenced is from John Mashey in the mid 1990’s when he was Chief Scientist at Silicon Graphics. An example of his use of the term ‘Big Data’ is available in the public domain from a presentation given at Usenix in 1999 – “Big Data and the Next Wave of Infrastress“.  Mr. Mashey also seems to have developed another word that is appropriate for current times – Infrastress or, Stress on the Infrastructure.


 


The uses of big data are everywhere with such diverse examples as the Library of Congress storing “tweets” for future review and study. As of March 2013 the library has stored 170 billion tweets and was adding 150 million a day. Another example is the military capturing real time video data of an area of interest. One sensor system the military uses is Gorgon Stare – a spherical array of nine cameras attached to an aerial drone. Used as a wide-area surveillance and sensor system Gorgon Stare can generate several terabytes of data per minute. Mounted on a drone with 24 hour loiter capability, the amount of data available is staggering.


 


In order to analyze and present intelligent results to a user, the amount of network bandwidth and computer power would be beyond what is available to the average user. The concept of virtualization has provided a way to integrate the compute part of cloud computing with the storage of massive amounts of data. By locating the compute resource in the cloud, the user is provided with access to a virtual computer that can provide the computing power sized to run the desired application. Chassis Plans wrote an article in COTS Journal about ruggedized virtual access for military applications – “Ruggedized Servers for the Data Centric Military Environment”. In the article, the author describes the type of server needed to access the cloud and big data.


 


Today’s limitation of network access for remote users and limits on local computing resources make the use of the cloud an appropriate technology. By co-locating the data and compute resource in the same area, the best use can be made of the infrastructure and the user can have access to interpreted data on devices such as tablets and smart phones. In other words, without the cloud the use of big data is very limited and without the cloud to store big data data mining applications would not be available. Two peas in a pod.


 


What are the next hurdles? There are several challenges facing users today. The first is developing the applications necessary to process the data. The data is now available, but the algorithms to process the data are not. Beyond the applications the question of security is a major issue. With the latest hacks of both government and private data bases, the issue of encryption and cyber defense are fast becoming strategic to future technology advances and expansion of cloud computing.


 


The big news last week was the hack of Sony Pictures which caused massive damage to their email services and the release of several un-released movies.  Cyber defense is not just a concern for the government and military.



Cloud Computing and Big Data - Two Peas in a Pod?

No comments:

Post a Comment