Posted my GTC 2010 on using GPGPU techniques in commercial software talk to slideshare

The best part of this talk was getting a nice write-up on anandtech (one of my go-to sites). You can also find this talk synced to a recording of me speaking on NVidia’s site.

Technical Debt

something I said in an e-mail thread today that I thought was worth reposting here:

The biggest part of paying technical debt is letting the dev team know that they are empowered to fix things and that they can take the time to do things right. Good developers hate looking at crappy code, their first response is always to fix it. If they don’t believe that they can (and are expected to), they will just hack around bad code with more bad code.

Development is more fun with kittens – three fun placeholder tools

Place Kitten gives you place holder images that make you wonder if you should bother ever replacing them, like:

Cupcake Ipsum generates much better Lorem Ipsum text that your run-of-the-mill tools, like:
Cupcake ipsum dolor sit amet. Toffee I love cake I love gummi bears cotton candy I love cookie. Wafer dragée lemon drops jelly-o jelly I love lollipop.
Fruitcake lollipop sweet roll muffin caramels. Cake I love macaroon biscuit candy canes dessert pie. Sweet apple pie lollipop jelly beans cheesecake gummies biscuit. Wypas I love croissant macaroon halvah.
Sweet roll tart toffee lemon drops candy canes soufflé bonbon. Ice cream tart cupcake I love icing tootsie roll jelly. Soufflé biscuit topping topping caramels pudding sugar plum cheesecake.
Halvah ice cream macaroon lollipop donut. Dessert gingerbread toffee gummies I love gingerbread applicake. Icing marshmallow cupcake.
Topping jelly beans fruitcake tootsie roll. Faworki soufflé chocolate cake. Dessert sesame snaps biscuit tiramisu cookie I love sesame snaps.

Placehold.it is where placekitten.com got their idea, it is also useful, but not quite as fun

(via Chuck Rose)

[Update 2/29/12]
Adding also PLACESHEEN.COM, yow!

(via Bob Archer)

Speaking this week at the SC11 Conference in Seattle

Cross-posted from my old Adobe blog

I’m privileged to once again be speaking at the SC conference. For those who don’t know it; “SC is the International Conference for High Performance Computing, Networking, Storage and Analysis.” If you are attending, I’ll be on a panel entitled Parallelism, the Cloud, and the Tools of the Future for the next generation of practitioners. I’ll be joining some of my compatriots in the Educational Alliance for a Parallel Future to once again discuss the skill sets that collegiate computer science programs should (and mostly aren’t) imparting to their students in the areas of parallel programming.

The abstract for the panel is as follows:

Industry, academia and research communities face increasing workforce preparedness challenges in parallel (and distributed) computing, due to the onslaught of multi-/many-core and cloud computing platforms. What initiatives have begun to address those challenges? What changes to hardware platforms, languages and tools will be necessary? How will we train the next generation of engineers for ubiquitous parallel and distributed computing? Following on from the successful model used at SC10, the session will be highly interactive, combining aspects of BOF, workshop, and Panel discussions. An initial panel will lay out some of the core issues in this topic with experts from multiple areas in education and industry. Following this will be moderated breakouts, much like collective mini-BOFS, for further discussion and to gather ideas from participants about industry and research needs.

If this sounds similar to the session from the Intel Developer Forum in September, there is good reason. It was the second most popular session of that conference. The IDF panel and breakout sessions covered some really interesting ground, and I really liked the format. I felt like the discussions I had with the people in my subgroup at IDF were deeper, more specific and more productive than a traditional panel format would have been.

While the speakers in this panel are different than the one in September, I think we’ll still end up splitting on the axis of using abstractions to teach fundamentals vs teaching from the first principles up. Which camp you are in seems at least somewhat determined by the fact that a number of panelists produce abstractions over the low-level elements as part of their work. I am very much in the fundamentals camp as I think that understanding what the abstractions are built on is fundamental to choosing the right abstraction, much as artists tend to start with representative figure drawing. What will make an interesting difference from IDF is the number of audience members who come from outside of computer science (HPC is used more by scientists for whom the computation is only a means to the end of solving a problem in a non-computational discipline). Those audience members are less likely to understand the fundamentals, nor care. For them parallelism is just a tool to get their answer faster. This should really make for a lively debate!

My statement for the panel is as follows (yes, I did crib the last paragraph from my earlier position):
The team I manage is building a single, modern, software product. A few years ago, that would have meant a desktop application written primarily in C++, most likely single-threaded. Today, it means software that runs on the desktop, but also on mobile devices and in the cloud. Working in my organization are developers who write shaders for the GPU, developers who write SSE (both x86 and ARM), developers using distributed computing techniques on EC2 and threads everywhere throughout the clients and server code. We write code in C, C++, ObjC, assembly, Lua, Java, C#, Perl, Python, Ruby and GLSL. We leverage Grand Central Dispatch, pThreads, TBB and boost threads. How many of the technologies that we use today in professional software development existed when we went to school? Nearly none. How many will still be used in a few years from now? Who knows. The reason we can continue to work in the field is that our education was grounded not just in programming techniques for the technology of the time, but also in computer architecture, operating systems, and programming languages (high level, low level and domain-specific).

Learning GPGPU was much easier for me because I could understand the architecture of graphics processors. I was able to understand Java’s garbage collection because I understood how memory management worked in C. I chose TBB over Grand Central Dispatch to solve a specific threading problem because I could evaluate both technologies given my experience
with pThreads.

We’re doing students a disservice if we teach them the concepts using high-level abstractions or only teach them a single programming language. Having an understanding of computer architecture is also critical to a computer science education.

These fundamentals of computer science do not necessarily need to be broken out into their own classes. They can and should be integrated throughout the curriculum. Threading should be part of every course. It is a critical part of modern software development. Different courses should use different programming languages to give students exposure to different programming models.

If I was a Dean of Computer Science somewhere, I¹d look to creating a curriculum where parallel programming using higher-level abstractions was part of the introductory courses using something like C++11, OpenMP or TBB. Mid-level requirements would include some computer architecture instruction. Specifically, how computer architecture maps to the software that runs on top of it. This may also include some lower level instruction in things like pThreads, Race conditions, lock-free programming or even GPU or heterogenous programming techniques using OpenCL. In later courses focused more on software engineering, specific areas like graphics, or
larger projects: I¹d encourage the students to use whichever tools they found most appropriate to the tasks at hand. This might even include very high level proprietary abstractions like DirectCompute or C++AMP as long as the students could make the tradeoffs intelligently because of their understanding of the area from previous courses.

You can read the position statements from the rest of the panel here.

Speaking once again on Parallelism and Computer Science Education at the Intel Developer Forum

Cross-posted from my old Adobe Blog

As a hiring manager building teams working on modern computer software; I’ve often been disappointed in the lack of a proper foundation in parallel algorithms and architectures being taught in current Computer Science curricula. To that end, I’ve been working with a group called the Educational Alliance for a Parallel Future that aims to improve Computer Science curricula in this critical area. The EAPF is once again convening a panel of educators and industry representatives to talk about this important issue and once again I am delighted to participate.

The panel is entitled: Parallel Education Status Check – Which Programming Approaches Make the Cut for Parallelism in Undergraduate Education? Unlike previous iterations of this panel where we spoke in generalities, this time we’ll be diving a bit deeper into specific technologies that we think are good starting places for educators to introduce to their students.

Here is an excerpt of the abstract:
The industry and research communities face increasing workforce preparedness challenges in parallel (and distributed) computing, due to today’s ubiquitous multi-/many-core and cloud computing. Underlying the excitement over technical details of the newest platforms is one of the thorniest questions facing educators and practitioners — What languages, libraries, or programming models are best suited to make use of current and future innovations? This panel will confront this conundrum directly through discussions with technical managers and academics from different perspectives. The session is convened by the Educational Alliance for a Parallel Future (EAPF), an organization with wide-ranging industry/academia/research membership, including Intel, ACM, AMD, and other prominent technology corporations.

The panel will be presented on September 15th, 2011 at 10:15am as part of the Intel Developer Forum 2011 at the Moscone Center in San Francisco, California. There are free passes for interested educators. Register now for a free IDF day pass using promo code DCPACN1.

My specific take has always been that I am not as interested in grounding in a specific parallelism library or abstraction. The pace of change in this area has only increased over the last few years with the rise of multi-core, GPGPU, HPC and heterogenous computing. Techniques and libraries have arisen, gained adoption, and fallen out of favor one after another.

A developer who only understands how algorithms can be mapped to OpenMP-style libraries is not as useful once the team moves to Grand Central Dispatch or OpenCL. A grounding in traditional task-level parallelism as well as data-parallelism techniques is a starting point. It is important not only to understand what each of them are but the different types of problems that they are each applicable to.

Higher level abstractions like OpenMP are good for introductory courses. However, it is important to understand fully how high-level abstractions map to lower level implementations and even the hardware itself. Understanding the hardware your software runs on is critical to find the best performance for your code. It is also critical to understanding why one particular higher level library might work better than another for a particular task on specific hardware.

Once you understand things like hyperthreading, pThreads, locking mechanisms, and why OpenCL or CUDA maps really well to specific problems, but not to others, then you can return to using higher level abstractions that let you focus on your algorithm and not the details.

If I was a Dean of Computer Science somewhere, I’d look to creating a curriculum where parallel programming using higher-level abstractions was part of the introductory courses using something like C++11, OpenMP or TBB. Mid-level requirements would include some computer architecture instruction. Specifically, how computer architecture maps to the software that runs on top of it. This may also include some lower level instruction in things like pThreads, Race conditions, lock-free programming or even GPU or heterogenous programming techniques using OpenCL. In later courses focused more on software engineering, specific areas like graphics, or larger projects: I’d encourage the students to use whichever tools they found most appropriate to the tasks at hand. This might even include very high level proprietary abstractions like DirectCompute or C++AMP as long as the students could make the tradeoffs intelligently because of their understanding of the area from previous courses.

Given that the panel consists of representatives from Intel, AMD, Microsoft, Georgia Tech as well as myself, I’m expecting this to be a very spirited conversation. I hope to see you there.

More information:
Paul Steinberg’s blog post about the panel
Ben Gaster’s post

Speaking on the “Teach Parallel” show on IntelTV tomorrow

[crosspost from my adobe.com blog]

Tomorrow morning, I’ll be speaking with Paul Steinberg of Intel and Tom Murphy of Contra Costa college about the criticality of understanding parallel programming techniques for industry.

In my previous role on the Adobe Image Foundation, it was an obvious requirement for our hiring candidates. We were building tools for a insanely parallel problem, image and video processing. Now that I’m working on a new product, it would maybe seem that it would not be as important. In fact, our threading models are even more complicated than in my previous group. My expectations around threading knowledge for incoming candidates are just as high.

Even the most modest mobile hardware is going (or has gone) parallel. In addition, the expectations from a user perspective around interactivity with their applications is never higher. A laggy touch interface is death to an application (or a platform). Going to get coffee while your image renders on a desktop is a thing of the past. User’s expectations of the software we write is higher than ever and it is nearly impossible to get this interactivity without taking advantage of multi-threading on today’s multi-core processors.

The tools continue to improve, but the threading models continue to evolve. A fundamental understanding of multi-threading is critical for anyone moving into Software Engineering or looking to stay current in their field.

I always enjoy talking with Paul and Tom, and expect that we’ll have a lively conversation.

Tune in live on May 17, 10:00 AM PDT

Here is Paul’s post on the subject.

Speaking at the AMD Fusion Developer Summit – June

If you are planning on attending the AMD Fusion Developer Summit in Bellevue, WA in June, come see me talk about Pixel Bender (probably for the last time!) with Bob Archer. Here is the description of the session:

Pixel Bender is a domain-specific image processing language created by the Adobe Image Foundation, and includes a runtime designed to work well across heterogeneous hardware, scaling efficiently for multiple cores. This runtime currently ships in a number of Adobe’s flagship products. Bob Archer, Technical Lead, and Kevin Goldsmith, Engineering Manager, will talk about the design of the language, compilers, and runtime. They will also discuss how the Adobe system can incorporate complimentary technologies like OpenCL and can scale to accommodate new hardware paradigms like the AMD Fusion processors.

Hope to see you there!

HPC on the (relative) cheap using public cloud providers

For the past several years, I’ve been working on leveraging high-performance computing techniques for high-throughput data intensive processing on desktop computers for stuff like image and video processing. Its been fun tracking what the multi-processing end of HPC has been doing, where the top 100 super-computer list has been very competitive and very active. Countries, IHVs and universities vie for who can generate more teraflops; spending millions and millions of dollars on the cooling plants alone for their dedicated data centers. These super computers exist to solve the BIG PROBLEMS of computing, and aren’t really useful beyond that.

At the same time, I’ve been following the public computing clouds like Amazon’s EC2, Google’s App Engine and Rack Space’s Public Cloud. These have been interesting for providing compute on the other end of the spectrum, occasional compute tasks, or higher average workloads with the occasional spike capability (like web apps). The public clouds are made up of thousands of servers and certainly rival or best the super computers in numbers of cores and raw compute power, but they exist for a different purpose.

This article in The Register really got me excited. Especially when I read this:

Stowe tells El Reg that during December last year, Cycle Computing set up increasingly large clusters on behalf of customers to start testing the limits. First, it did a 2,000-core cluster in early December, and then a 4,096-core cluster in late December. The 10,000-core cluster that Cycle Computing set up and ran for eight hours on behalf of Genentech would have ranked at 114 on the Top 500 computing list from last November (the most current ranking), so it was not exactly a toy even if the cluster was ephemeral.

The cost of running this world-class super computer?

Genentech loaded up its code and ran the job for eight hours at a total cost of $8,480, including EC2 compute and S3 storage capacity charges from Amazon and the fee for using the Cycle Computing tools as a service.

Real world HPC is now coming into price points where it is accessible to even small companies or research groups. This seems like a ripe opportunity for companies who can apply HPC-techniques to solve real problems for others, and for tools vendors who can make using these ephemeral clouds easier for companies who want to take advantage of them without having to build up high-end expertise in-house.

On Test-Driven Development

I was having a conversation with someone the other day about unit testing. OK, actually I was interviewing someone for a Quality Engineering position on my team. We were talking about the difference between white-box tests that quality engineers write and tests that developers write.

I suggested that good white-box testers test the functionality and the failure cases (the intent of the function) and developers test the code that they’ve written (the function as coded). This then lead me to a new revelation around test-first development methodologies (or possibly reminded me of something I had forgotten).

I have been a proponent of writing tests first, since I first started doing Extreme Programming and read Kent Beck’s original book, Extreme Programming Explained: Embrace Change while working at Bootleg Networks (thanks Carmine for making me do that, by the way). Although admittedly, like many developers, I haven’t always been that rigorous at following that rule.

What I like about writing the tests before the function is that it clarifies my thinking about what the function should do, it alerts me to the corner cases, it gives me reasons to consider if the function is doing too much, and it gives me a way to instantly know if the function works once it is written. Writing the tests first also makes sure that the tests are written at all. Once the function is coded, it sometimes gets tempting to move on to the next bit of coding work with the intention of filling in the tests later.

What I hadn’t considered about writing the tests before the code is that it puts me into a quality mindset without having any bias to the code as I’d written it. I’m divorced from my own blind-spots around my coding. This actually leads me to writing better tests because I have no assumptions about how the code should work or fail. I’m testing the functionality, not the code.

Maybe I’d thought about this before, but I hadn’t really considered that benefit recently until that moment. Now, when I start to get lazy about writing my unit tests before my implementation, I’ll have a better reason to keep up my discipline.

A Couple New Pixel Bender Links

Frequent contributor to the Pixel Bender forums, Royi Avital, has released a new set of After Effects and Photoshop plug-ins written with Pixel Bender under the name Flixel Plugins. The first three are now available on aescripts.com

Flixel Plugins on aescripts.com

ApexVJ is a really beautiful Flash-based music visualizer that uses Pixel Bender

Simo Santavirta, the creator, wrote an article on his blog about it.