Tuesday, June 9, 2009
Cloud computing - a programming perspective
This is a repost of a blog entry created for the OOPSLA 2009 official blog.
Cloud computing is the “new hot” topic. Simply put, various business pressures, a multitude of pain points, and the maturity of a series of Web technologies (networking, APIs, and standards) have made it possible and cost-effective for businesses, small and large, to completely host data- and application-centers virtually... in the cloud, if you may.
Cloud computing providers, e.g., Amazon, reuse their expertise in efficiently managing and hosting their own Web systems and applications, and expose that core expertise as a set of Web APIs. Using the Amazon Web Services Elastic Compute Cloud (EC2), anyone with a credit card and some programming can provision a server instance and install a Web application on it and thus immediately have a presence on the Web. Using economies of scale for server hardware combined with virtual machine technologies, data- and application-centers automation expertise, as well as extensive instrumentations, Amazon is able to provide that service globally for pennies at the hour. There are no binding contractual agreements and Amazon will only bill you for the hours you have used.
In addition to compute instances, Amazon also provides various other compute resources on their cloud platform, e.g., storage (file and block), message queues, batch data processing, and others. Following Amazon’s lead, various companies, including Google, IBM, and Microsoft, are also exposing frameworks, services, platforms, and applications to a world-wide audience from within a Web browser and with simple Web APIs. Cloud computing is no less than a democratization of compute resources. With cloud computing, vast compute resources no longer require huge and long-term investments but instead can be had and consumed, as Amazon chairman Jeff Bezos, like to say, “by the drink”.
Whether cloud computing will fulfill the high-expectations that many are advocating is still to be determined. Various challenges remain and, in our opinion, we are reaching the peak of the typical hype curve that new technologies follow. However, regardless of whether cloud computing will be a bust or continue to be the hit that it has certainly been so far, there is one undeniable truth that some seem to ignore... The current success of cloud computing and, we believe, its future successes, are heavily tied to how easy the cloud and cloud applications are to program as well as to maintain and to scale. And this is precisely why OOPSLA matters to cloud computing advocates, users, and providers and vice-versa.
As we mentioned, with the cloud, computing resources are cheap and widely available. In a matter of minutes, one can provision 100s of server instances on the Amazon EC2 cloud along with terabytes of storage and more aggregate MIPS than what is available on most recent mini-computers. All of this for around $10 an hour. While most anyone could afford such computing capacity at these price points, what is hard for most is to take advantage of that cheap capacity. The problem is no longer one of provisioning the resources, but rather one of taking advantage of these resources and of efficiently doing so.
We are at the beginning of a new evolution of programming. One that is taking place with this move to cloud computing. For lack of a better moniker, we call it cloud programming. It is about being able to scale programs to take advantages of these on-demand cloud compute resources. Programming distributed nodes of computation has always been one of the classic ongoing problems of computer science. The cloud, it seems, has thrusted this problem and associated corollary issues to the forefront...
While cloud programming has some resemblance to old-style distributed programming or super computing or multicore programming, it is a different problem due to the changes in the core assumptions and constraints. On the cloud, most compute resources are essentially server instances with virtual compute capacities or virtualized services. The network is the Internet and assumptions about co-locations, latency, and errors cannot be made. The same concerns one has with real servers in your data centers also still persist. That is, securing, upgrading, automating, and managing these virtual instances are still very much part of the programming that one must do to reap the benefits of new cloud infrastructures. Scripting languages, e.g., Ruby, Python, and Groovy, are already taking center stage to solve some of these issues.
Additionally, now that storage can shrink and grow on demand and for very low costs, while keeping reasonably good qualities of service, the other issue is how to manipulate the vast amount of data that one can now store. Google had a similar concern years ago as it improved its search engine while managing expenses in growing its data centers to match the unprecedented growth of the Web. Google engineers and scientists cleverly figured out how to parallelize data computation over large clusters of cheap and replaceable compute nodes. The MapReduce programming model is specifically designed to help engineer algorithms that can scale and run on the resulting big data that one now accumulates...
Programming for massive scale is the key challenge. We firmly believe that new styles of programming, new programming frameworks, and new programming languages may be one of the key sources of innovations for the cloud. Imagine when cloud frameworks and cloud programming environments provide, in near real time: the cost, the energy impact, and the automation facilities that a cloud computing infrastructure enables. Plus now imagine being able to program these multiple cloud nodes either in batch or in real-time, while satisfying best practices of Web security and privacy. The combined results would be the nirvana of Web programming. Scaling automatically your compute resources in a cost-efficient and environmentally friendly fashion while managing the resulting deluge of data and potential influx of users...
Surely there are many PhD theses to be had to help address some of the fundamental scientific and engineering issues involved in achieving such an idealized state of Web computing. In some ways we maybe vastly simplifying the issues and that many of the challenges involved have been studied in various branches of computer science and software engineering for the past 30 years. However, the point here is not to claim that cloud computing is the assured next wave of computing, we don’t know; but rather, we would like this post to simply serve as a reminder that the various issues in system, data, and distributed computing that cloud computing brings to the forefront could be addressed from innovations in frameworks, programming styles, and programming languages... OOPSLA, it seems, from its long historical track record of ground breaking innovations in this space, may be a natural choice for the genesis of some of these new future eureka moments.
06/01/09 - fixed typos: accumulate => accumulates