High-Performance Computing in Finance: A Customer’s Perspective [4/7]
14 Jun 2007
Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.
What’s Hard, What Isn’t
Part of the problem is that there is an unclear separation between what is hard and what isn’t, and the information out there isn’t helping at all much. Implementing MPI or distributed synchronization objects or job scheduling algorithms is reasonably hard and should probably be left to experts. But naively distributing a command-line executable or a method on a serializable class is cake.
Currently, it’s too easy to get distracted by the watchworks behind a high-performance computing solution. Just like you don’t need to actually solder a north bridge or layout a PCB motherboard to write a blog entry, you don’t need to drive yourself mad generating different portfolios on different machines. What you need is guidance, guidance to know which problems are proudly parallel, which units of parallelization are appropriate, which data should be shared versus copied, which configurations create bottlenecks, how to get your code and data out to the compute nodes, and when to do things simple and when to roll up your sleeves and get dirty. In many ways, distributed computing is simpler than multithreading since you’ve got better insulation between your processes and have to be more explicit about moving state around.
At Bridgewater, we’ve actually created a number of intern projects lately out of distributing existing .NET systems using a product called the Digipede Network from Digipede Technologies. I wouldn’t trust a single one of these interns to write high-quality multithreaded code or design caching strategies or implement distributed matrix multiplication using MPI, but on the other hand they’re able to roll out incredible distributed applications that work great with less than a week’s exposure to Digipede. They can do this because the problems are well-suited to the solution, the product is appropriately targeted to our company’s platform, and the appropriate samples, tools, and guidance exist to get the job done.
A Horrible Clang
If anything, that’s the reason why high-performance computing has such a bad rap. Until recently, high-performance computing meant Unix and clusters and fancy interconnects, all allied with a masochistic appreciation for open-source, thesis projects, and outdated PostScript documentation. The samples, tools, and guidance we’ve inherited have not been aimed at us, the predominant enterprise developer on Windows shifting wholesale from COM to .NET, VB to C#. MPI and OpenMP don’t target this audience. They target the hardcore C++ set. As much as I personally love C++ (almost as much as Python), it’s anti-productive for me to introduce it to my organization just to take advantage of vectorizing compilers, OpenMP, and MPI. I’d sooner settle for NullReferenceExceptions and reference semantics than GPFs and copy constructors any day of the week. Products like Digipede are a step in the right direction, but overall the message of high-performance computing on Windows is muddled, aimed at a narrow market that may not be interested.
At Ellington, my previous firm that specialized in mortgage-backed securities, we had a 256-node high-performance cluster built on Linux, GCC, and MPI. It would be a plum for Compute Cluster Server. Yet, I can’t think of one reason why I, as a principal software engineer, could recommend it to them. There is no point. They have too much invested in their current infrastructure, and there aren’t enough clear-cut savings and advantages that might warrant the costs and resultant risks. Perhaps if they were starting fresh or integrating some third-party analytics packages that only offered support for C++ on Windows, it would make sense. But that isn’t the case. Target their nascent .NET trading desk analytics with something other than MPI, though, and maybe you’ve got a customer.
The Excel Services for SharePoint 2007 story is another mixed bag from a high-performance computing perspective. It’s a fabulous product from the perspective of centrally sharing and managing workbooks at the enterprise level. I can guarantee that it will play a major part of Bridgewater’s Excel-heavy infrastructure. However, from the perspective of integrating your quantitative analysts into your engineering process, it’s a miss.
Another Microsoft product, Microsoft Expression Blend (formerly Expression Interactive Designer or “Sparkle”), demonstrates a great way to directly integrate non-engineering contributors (such as illustrators and UI designers) into the Microsoft Visual Studio 2005 development process. The project artifacts they create are full-fledged members of the solution. Engineers and illustrators work in parallel, and the solution is constantly updated.
We need the analog for our analysts. They need to work in their development environment of choice, Microsoft Excel, and we need to have their work immediately accessible as compiled libraries. The UI is irrelevant; it the math and models we want. The UI is just the vehicle for our analysts to develop and test their methods. I don’t want my engineers shoehorning distributed computing code into an analyst’s spreadsheet. I want analysts to compile their spreadsheets and have our engineers reference them as class libraries that can be inserted into our broader high-performance computing infrastructure.
Call it Microsoft Visual Excel.
Imagine if an analyst could declare one or more worksheets as a class, highlight particular cells as class properties, function inputs and return values, with method bodies filled in by Excel worksheet functions and calls to methods or macros written in the .NET programming language of your choice. Imagine a PowerShell-like metaphor, where everything is an object. And now imagine that you can compile the whole thing into an assembly that can be directly referenced by a .NET project. That would be a better building-block for our high-performance computing applications than Excel Services, as it better addresses our engineers and our engineering process.