HPC on the (relative) cheap using public cloud providers

For the past several years, I’ve been working on leveraging high-performance computing techniques for high-throughput data intensive processing on desktop computers for stuff like image and video processing. Its been fun tracking what the multi-processing end of HPC has been doing, where the top 100 super-computer list has been very competitive and very active. Countries, IHVs and universities vie for who can generate more teraflops; spending millions and millions of dollars on the cooling plants alone for their dedicated data centers. These super computers exist to solve the BIG PROBLEMS of computing, and aren’t really useful beyond that.

At the same time, I’ve been following the public computing clouds like Amazon’s EC2, Google’s App Engine and Rack Space’s Public Cloud. These have been interesting for providing compute on the other end of the spectrum, occasional compute tasks, or higher average workloads with the occasional spike capability (like web apps). The public clouds are made up of thousands of servers and certainly rival or best the super computers in numbers of cores and raw compute power, but they exist for a different purpose.

This article in The Register really got me excited. Especially when I read this:

Stowe tells El Reg that during December last year, Cycle Computing set up increasingly large clusters on behalf of customers to start testing the limits. First, it did a 2,000-core cluster in early December, and then a 4,096-core cluster in late December. The 10,000-core cluster that Cycle Computing set up and ran for eight hours on behalf of Genentech would have ranked at 114 on the Top 500 computing list from last November (the most current ranking), so it was not exactly a toy even if the cluster was ephemeral.

The cost of running this world-class super computer?

Genentech loaded up its code and ran the job for eight hours at a total cost of $8,480, including EC2 compute and S3 storage capacity charges from Amazon and the fee for using the Cycle Computing tools as a service.

Real world HPC is now coming into price points where it is accessible to even small companies or research groups. This seems like a ripe opportunity for companies who can apply HPC-techniques to solve real problems for others, and for tools vendors who can make using these ephemeral clouds easier for companies who want to take advantage of them without having to build up high-end expertise in-house.

Server-based DRM solutions are hostile to consumers

I have a long history with DRM (Digital Rights Management): I worked on the Windows Media 7 Encoder team; I worked at two different internet video startups; and as the owner of record label, I experimented with some of the very first paid digital download solutions (all long lost to internet history at this point).

When I first learned about the DRM mechanism where the player would “phone home” periodically to make sure that you were still licensed to the content, I immediately realized that this was a really fragile way to license media. I’m not talking about subscription content (like Rhapsody), streaming media (like Hulu/YouTube/Flash Media Server) or rentals (like Amazon/iTunes rental), I’m talking about content that is purchased by the consumer. The issue is that there are 1000 ways that the user can lose access to their content without any ill intent on their part. This isn’t an issue if the licenser of their content is still in business and supporting the licensing mechanism. However, even large companies sunset their DRM technology support, screwing over their customers (see Google Video and Microsoft Plays For Sure for example). Depending on how onerous the original licensing scheme is and how it was implemented, buying a new computer, changing the hardware configuration, upgrading system software, the company dropping support for the DRM, the licensing company’s servers going down or just the user being without the internet can cause a user to lose access to the content that they paid for and legally own.

Maybe the user got some warning and could back up their content to some other format (if allowed by the licensing scheme, it often isn’t); but maybe they didn’t see or understand the warning. Then it is too late. Is it the consumer’s fault? No, it is never the consumer’s fault. They purchased digital content with the expectation of owning it forever, just like when they purchased their media as hard goods.

Onerous DRM has been put in place by media companies desperate to avoid piracy, but as it has been written about in so many other places, DRM makes more pirates than it avoids. It makes it more difficult for the people who want to get their content legally by adding roadblocks between them and their purchases and it doesn’t stop the pirates who avoid the whole thing. I wonder how many Plays For Sure customers went to an illegal site to re-download the content that they had already purchased when they lost access to it. I wonder if any of them felt like they were breaking the law at that point. I doubt it. They had paid for something and had been denied access to it. Maybe they were mad at Microsoft, but they were probably more mad at the record labels, because that was the product they purchased. Microsoft was just the store.

I was thinking about this again today when I went to purchase a song off of iTunes and found that Apple had lost my Apple ID. This was the Apple ID that I had spent years buying content from iTunes with. Sure, Apple has moved to make their music DRM free, but I haven’t completely updated my catalog yet, and there is a lot of video that I have paid money for as well that is still subject to Apple’s DRM. While their mechanism still allows me to play my content on my authorized computers (as far as I can tell so far), it will not permit me to authorize a new computer. If Apple isn’t able to fix this problem, what happens to the content I purchased over time? If I can’t access it anymore through no fault of my own, am I in the wrong legally to download it off a file-sharing site?

DRM models have continued to evolve over the years, but I think that the audio model has shown the way for purchased content. It is high time for media owners to allow the people that pay for a full copy of their content to own that content outright, with nothing that could prevent the consumer from having access to the content that they paid for, including transcoding as media formats change over time. Otherwise, they will alientate their consumers as they find they cannot have what they paid for.

note: I avoided mentioning the new licensing models that have sprung up, where when you “buy” a copy of a song or movie the license agreement says that you don’t really own it, which is becoming more common as a way to avoid legal issues when user’s circumvent DRM to make fair-use copies or so that they cannot sue if they cannot access their content. I avoided mentioning it because:
A) it muddies the discussion.
B) I think it is evil.

I guess it is still too early… Lively bites the dust

There is a bit of schadenfreude here on my part. Lively was reviving concepts from the mid to late 90s and passing them off as something new (including one I worked on). All of the efforts of that time died a slow death, and the thought was that we (they) were ahead of the curve. Lively’s lack of uptake slams the door on graphical chat once and for all, I guess.

Official Google Blog: Lively no more

That’s why, despite all the virtual high fives and creative rooms everyone has enjoyed in the last four and a half months, we’ve decided to shut Lively down at the end of the year. It has been a tough decision, but we want to ensure that we prioritize our resources and focus more on our core search, ads and apps business. Lively.com will be discontinued at the end of December, and everyone who has worked on the project will then move on to other teams.

We’d encourage all Lively users to capture your hard work by taking videos and screenshots of your rooms.

Nice discussion of white box and developer-driven testing in Google Chrome comic book

I debated copying the scans from blogoscoped.com or referencing the images here, but I decided instead to refer you to the appropriate pages to be a good blog citizen.

Getting Scott McCloud to write a comic book announcing your product is a great idea. He did a great job distilling some complicated stuff into a very accessible piece. Tons of people will talk about the Google Chrome announcement and what it means for Microsoft and about using multiple processes for tabs.

One of the things that struck me though, was the nice discussion of white box automated testing. Also, a very simple and concise description of developer-driven testing. I am a huge proponent of these principles since I first worked on an XP project 8 years ago and became an XP coach. Every project I’ve worked on since has had a large test-driven development component and hard-core whitebox QEs (when I’ve had the resources). Doing automated stress testing on a browser is a no-brainer. Internet Explorer has been doing it forever. Google isn’t doing anything new or different here: fuzzing inputs isn’t new, and neither is reducing the test space to make automation run faster and results relevant for users. However, McCloud’s comic does a great job of explaining these ideas in a very simple manner. It’s a great tool for developers, QEs or engineering managers trying to explain why these things are important to others in their organization.

Check it out on pages 9, 10, and 11 of the Google Chrome comic book.