Saturday, June 27, 2009

Social software and media, the virtues of agile development, and the new Iranian revolution

This is a repost of a blog entry created for the OOPSLA 2009 official blog.

The Web is now social. Increasingly, people young and old are spending a substantial part of their social lives by using the Web. Social networking sites like Facebook, MySpace, YouTube, Twitter, and many others, are giving people the ability to maintain various social interactions and connections with friends, family, and acquaintances. Indeed using Facebook one often hears of how people are reconnecting with old friends and making new ones. Twitter has quickly transformed into one of the best means for sharing real-time information on various topics all across the Web. YouTube’s videos can transform the fortune of unknown talented individuals from the most remote part of the world as well as shed light on issues and facts that otherwise would have been completely ignored.

As the world bears witness to the recent unrest in Iran, the power of social media has never been clearer and more manifest. Twitter, Facebook, and Youtube are giving a voice, a face, and a communication channel to the people on the streets of Tehran. All this comes despite the efforts of the Iranian regime to shut down media and reporters across the country. While it is uncertain how this new revolution in Iran will end up and as the world continue to watch intensely, there is one undeniable truth, and that is the unintended impact of social media. As Clay Shirky puts it, social media has enabled the democratization and amplification of the voices of the people...

But what does social media or social software have to do with OOPSLA? Surely they are software systems but what else does the conference bring to help this new wave of Web software.
It is true that social software is about connecting and empowering people and thus appears to be a purely social phenomenon that simply is enabled and constructed using basic Web software principles and approaches... There is, however, an important and subtle aspect that is hard to see on the surface and that has its roots in OOPSLA. In a nutshell it is about agility... Looking deeper at the various social media sites aforementioned, one other thing becomes ample clear. Many of the usages and social consequences of these sites were mostly, if not completely, unintended.

Most of the sites started with initial ideas of connecting people but ended up with emerging usage patterns that are truly powerful and consequential. The creators of Twitter did not set out to create the new voice for modern democratic revolutions---that fact has emerged accidentally. So, the question becomes: how does one create software that can have the type of profound social impacts and successes such as Twitter? There is no “cookie cutter” solution, however, one well known approach (one that was used at Twitter) is to use software frameworks and principles that are in line with the principles of agile software development.

At a recent talk at IBM’s Almaden Research Center, Twitter’s CEO Jack Dorsey was asked by Almaden Ph.D. intern Ajith Ranabahu, if in the light of recent scaling issues that Twitter has experienced, would Dorsey and the original Twitter founders have chosen another language or another framework to create their company. While Dorsey acknowledged some of the limitations of the Ruby on Rails platform they are using, he was quick to say that he would still have made the same framework choice given another chance... Dorsey's reasoning is simply that the key factor is not one of scaling and architecture, but rather of agility and speed of development.

Rails is well known for providing both development speed and agility in bundles. By being able to materialize their ideas quickly, Dorsey and the other Twitter co-founders created a working version of their site in a few weeks which let them observe the initial users and also let them continuously iterate and find the subsequent micro and now macro successes. With each group of users, new patterns emerged and Dorsey and team could quickly iterate, adjust, and malleably modify their software to match the emerging usages and patterns. Without the agile virtues of Ruby on Rails, it is doubtful Twitter would be the phenomenon that it has become today.

As maybe the preeminent incubator of agile software development thinking during the last decade and a half, and the place where agile has gone mainstream, the OOPSLA conference it seems, has been a key enabler for the Agile movement that has inspired frameworks like Rails and indirectly sites such as Twitter. The agile practices of test-driven development, pair programming, continuous integration, and the SCRUM team organization approach, have roots either directly or indirectly at OOPSLA, or have progenitors that frequently attend and participate at the conference. And if we go even further, the virtues of rapid, instant, prototyping and having “the customer as the driver”, like Kent Beck likes to say, now may be taking their natural course in social media and crowd sourcing of content.

And while social software is undoubtedly a phenomenal success, there remains some serious challenges, and this also is where the OOPSLA community could help further. First, there are the programming challenges. The amplification attributes of Twitter and Facebook occur because the Web is now programmable. Using APIs and simple scripts, it is easy to create aggregation points, as well as new data sinks and sources for information flowing through the social media channels. This is how a tweet from the streets of Tehran can flow, be retweeted, and end up, almost instantaneously, on the television screens of millions in the United States and Europe. The challenge is making it quick and easy for anyone to collect, filter, and aggregate social media information.

Second, and perhaps most importantly, are the challenges around the provenance of the information that is flowing through the social media channels. Now that everyone has a voice, it becomes increasingly difficult to discern authentic voices from those trying to manipulate the system. Here, research in data provenance, data mining, and data filtering for the massive amount of realtime and streaming data is key. Realtime and stream programming pose significant and fundamental challenges that beg for systems, frameworks, and programming language help.

Finally, as in all computing for open systems (such as the Web), the concerns of privacy and security remain paramount. It is now well accepted that addressing these persistent issues cannot be done after the fact, but are aspects that must be addressed at the early stages of development. There is a clear need to share best practices and uncover patterns to help overcome these challenges...

So while social media and social software are helping transform the fabric of social interactions from the hills of Silicon Valley to the bars of Austin, to the caf├ęs of Paris, and to the streets of Tehran, remember that many of these social consequences were not planned, but rather emerged from the resulting empowering software that is itself possible due to the virtues of agile software development and practices... And together with the community that produced and helped agile practices go mainstream, we can help address some of the important remaining social media challenges so that the new voice of the people can persist, remain strong, and authentic.

Go here to watch Dorsey's talk at Almaden and Ajith's question in toward the end.

1. Initial post on 06/27/2009

Tuesday, June 9, 2009

Cloud computing - a programming perspective

This is a repost of a blog entry created for the OOPSLA 2009 official blog.

Cloud computing is the “new hot” topic. Simply put, various business pressures, a multitude of pain points, and the maturity of a series of Web technologies (networking, APIs, and standards) have made it possible and cost-effective for businesses, small and large, to completely host data- and application-centers virtually... in the cloud, if you may.

Cloud computing providers, e.g., Amazon, reuse their expertise in efficiently managing and hosting their own Web systems and applications, and expose that core expertise as a set of Web APIs. Using the Amazon Web Services Elastic Compute Cloud (EC2), anyone with a credit card and some programming can provision a server instance and install a Web application on it and thus immediately have a presence on the Web. Using economies of scale for server hardware combined with virtual machine technologies, data- and application-centers automation expertise, as well as extensive instrumentations, Amazon is able to provide that service globally for pennies at the hour. There are no binding contractual agreements and Amazon will only bill you for the hours you have used.

In addition to compute instances, Amazon also provides various other compute resources on their cloud platform, e.g., storage (file and block), message queues, batch data processing, and others. Following Amazon’s lead, various companies, including Google, IBM, and Microsoft, are also exposing frameworks, services, platforms, and applications to a world-wide audience from within a Web browser and with simple Web APIs. Cloud computing is no less than a democratization of compute resources. With cloud computing, vast compute resources no longer require huge and long-term investments but instead can be had and consumed, as Amazon chairman Jeff Bezos, like to say, “by the drink”.

Whether cloud computing will fulfill the high-expectations that many are advocating is still to be determined. Various challenges remain and, in our opinion, we are reaching the peak of the typical hype curve that new technologies follow. However, regardless of whether cloud computing will be a bust or continue to be the hit that it has certainly been so far, there is one undeniable truth that some seem to ignore... The current success of cloud computing and, we believe, its future successes, are heavily tied to how easy the cloud and cloud applications are to program as well as to maintain and to scale. And this is precisely why OOPSLA matters to cloud computing advocates, users, and providers and vice-versa.

As we mentioned, with the cloud, computing resources are cheap and widely available. In a matter of minutes, one can provision 100s of server instances on the Amazon EC2 cloud along with terabytes of storage and more aggregate MIPS than what is available on most recent mini-computers. All of this for around $10 an hour. While most anyone could afford such computing capacity at these price points, what is hard for most is to take advantage of that cheap capacity. The problem is no longer one of provisioning the resources, but rather one of taking advantage of these resources and of efficiently doing so.

We are at the beginning of a new evolution of programming. One that is taking place with this move to cloud computing. For lack of a better moniker, we call it cloud programming. It is about being able to scale programs to take advantages of these on-demand cloud compute resources. Programming distributed nodes of computation has always been one of the classic ongoing problems of computer science. The cloud, it seems, has thrusted this problem and associated corollary issues to the forefront...

While cloud programming has some resemblance to old-style distributed programming or super computing or multicore programming, it is a different problem due to the changes in the core assumptions and constraints. On the cloud, most compute resources are essentially server instances with virtual compute capacities or virtualized services. The network is the Internet and assumptions about co-locations, latency, and errors cannot be made. The same concerns one has with real servers in your data centers also still persist. That is, securing, upgrading, automating, and managing these virtual instances are still very much part of the programming that one must do to reap the benefits of new cloud infrastructures. Scripting languages, e.g., Ruby, Python, and Groovy, are already taking center stage to solve some of these issues.

Additionally, now that storage can shrink and grow on demand and for very low costs, while keeping reasonably good qualities of service, the other issue is how to manipulate the vast amount of data that one can now store. Google had a similar concern years ago as it improved its search engine while managing expenses in growing its data centers to match the unprecedented growth of the Web. Google engineers and scientists cleverly figured out how to parallelize data computation over large clusters of cheap and replaceable compute nodes. The MapReduce programming model is specifically designed to help engineer algorithms that can scale and run on the resulting big data that one now accumulates...

Programming for massive scale is the key challenge. We firmly believe that new styles of programming, new programming frameworks, and new programming languages may be one of the key sources of innovations for the cloud. Imagine when cloud frameworks and cloud programming environments provide, in near real time: the cost, the energy impact, and the automation facilities that a cloud computing infrastructure enables. Plus now imagine being able to program these multiple cloud nodes either in batch or in real-time, while satisfying best practices of Web security and privacy. The combined results would be the nirvana of Web programming. Scaling automatically your compute resources in a cost-efficient and environmentally friendly fashion while managing the resulting deluge of data and potential influx of users...

Surely there are many PhD theses to be had to help address some of the fundamental scientific and engineering issues involved in achieving such an idealized state of Web computing. In some ways we maybe vastly simplifying the issues and that many of the challenges involved have been studied in various branches of computer science and software engineering for the past 30 years. However, the point here is not to claim that cloud computing is the assured next wave of computing, we don’t know; but rather, we would like this post to simply serve as a reminder that the various issues in system, data, and distributed computing that cloud computing brings to the forefront could be addressed from innovations in frameworks, programming styles, and programming languages... OOPSLA, it seems, from its long historical track record of ground breaking innovations in this space, may be a natural choice for the genesis of some of these new future eureka moments.

06/01/09 - fixed typos: accumulate => accumulates