Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.

Know Thy Developer

To make us customers and help us drive high-performance computing through our infrastructure, you have to understand that our engineers prioritize business first and technology second. It’s a mandate. The technology services the business goal, not the reverse. We attract and retain brilliant developer talent. We shower them with education and learning opportunities. At the end of the day, though, we are grooming them to be generalists, not specialists. We care more that they understand their menu of options and know how to choose a solution appropriate to the problem than they become expert on the inner workings of any one technology. If we demand any specialized knowledge at all, it’s usually of finance and economics, not distributed computing, algorithms, program analysis, or detailed performance optimization. I know that they can learn it, but in practice they can’t be their emphasis. Instead, they need to use that mindshare to ensure that we’re doing the right by our customers every step of the way.

Yet, this emphasis gives way to rather pathological situations. Last week I led a code review of an important class in our data infrastructure. It was about 300 lines long, and it was written by one of our most senior and productive engineers. The class takes in a large matrix of Excel-like data extraction and manipulation formulas, evaluates each formula, and passes back the same matrix with each formula overwritten by its result. It’s used widely throughout the company by both end-users and automated processes.

In his attempt to improve the performance of this class on multiprocessor hosts, the engineer decided to parallelize evaluation of the formulas. He constructed a list of threads, appended onto it one thread per matrix element, and then used a semaphore to ensure that only n threads were running at a time. In other words, if you passed in a matrix with 10,000 cells, he’d create 10,000 threads, only eight of which (by default) would be runnable. He had locks in all the right places, test cases, a strong public interface, and copious comments. It even worked. But his poorly thought-out design would bring a server class system to its knees in seconds, and he didn’t know why. After I showed him how to rewrite it in a more conventional producer/consumer pattern with a fixed number of threads, calls to this class which used to take ten minutes were now taking less than ten seconds apiece.

Now this guy is smart. He’s a great coder. He is excellent at picking appropriate technology for a given problem. In fact, he even designed and implemented most of the data infrastructure this class was part of. But when it comes to threading, he’s a rank beginner. He just didn’t know of a better way to do it.

That is the paradox of our engineers. They’re wicked smart. They are as capable as anyone of pulling off a MacGyver with duct tape and bailing wire and making a workable system of non-integrated pieces. But our developers need frameworks, patterns, and comprehensive tools for parallelizing and distributing their business logic. Without them, they’ll start making it up on their own. We all know that it’s not where they should be placing our efforts. With all the other things they need to do, they won’t do it with excellence, and they won’t think through all the things they don’t know.

[continue]

Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.

What’s Hard, What Isn’t

Part of the problem is that there is an unclear separation between what is hard and what isn’t, and the information out there isn’t helping at all much. Implementing MPI or distributed synchronization objects or job scheduling algorithms is reasonably hard and should probably be left to experts. But naively distributing a command-line executable or a method on a serializable class is cake.

Currently, it’s too easy to get distracted by the watchworks behind a high-performance computing solution. Just like you don’t need to actually solder a north bridge or layout a PCB motherboard to write a blog entry, you don’t need to drive yourself mad generating different portfolios on different machines. What you need is guidance, guidance to know which problems are proudly parallel, which units of parallelization are appropriate, which data should be shared versus copied, which configurations create bottlenecks, how to get your code and data out to the compute nodes, and when to do things simple and when to roll up your sleeves and get dirty. In many ways, distributed computing is simpler than multithreading since you’ve got better insulation between your processes and have to be more explicit about moving state around.

At Bridgewater, we’ve actually created a number of intern projects lately out of distributing existing .NET systems using a product called the Digipede Network from Digipede Technologies. I wouldn’t trust a single one of these interns to write high-quality multithreaded code or design caching strategies or implement distributed matrix multiplication using MPI, but on the other hand they’re able to roll out incredible distributed applications that work great with less than a week’s exposure to Digipede. They can do this because the problems are well-suited to the solution, the product is appropriately targeted to our company’s platform, and the appropriate samples, tools, and guidance exist to get the job done.

A Horrible Clang

If anything, that’s the reason why high-performance computing has such a bad rap. Until recently, high-performance computing meant Unix and clusters and fancy interconnects, all allied with a masochistic appreciation for open-source, thesis projects, and outdated PostScript documentation. The samples, tools, and guidance we’ve inherited have not been aimed at us, the predominant enterprise developer on Windows shifting wholesale from COM to .NET, VB to C#. MPI and OpenMP don’t target this audience. They target the hardcore C++ set. As much as I personally love C++ (almost as much as Python), it’s anti-productive for me to introduce it to my organization just to take advantage of vectorizing compilers, OpenMP, and MPI. I’d sooner settle for NullReferenceExceptions and reference semantics than GPFs and copy constructors any day of the week. Products like Digipede are a step in the right direction, but overall the message of high-performance computing on Windows is muddled, aimed at a narrow market that may not be interested.

At Ellington, my previous firm that specialized in mortgage-backed securities, we had a 256-node high-performance cluster built on Linux, GCC, and MPI. It would be a plum for Compute Cluster Server. Yet, I can’t think of one reason why I, as a principal software engineer, could recommend it to them. There is no point. They have too much invested in their current infrastructure, and there aren’t enough clear-cut savings and advantages that might warrant the costs and resultant risks. Perhaps if they were starting fresh or integrating some third-party analytics packages that only offered support for C++ on Windows, it would make sense. But that isn’t the case. Target their nascent .NET trading desk analytics with something other than MPI, though, and maybe you’ve got a customer.

The Excel Services for SharePoint 2007 story is another mixed bag from a high-performance computing perspective. It’s a fabulous product from the perspective of centrally sharing and managing workbooks at the enterprise level. I can guarantee that it will play a major part of Bridgewater’s Excel-heavy infrastructure. However, from the perspective of integrating your quantitative analysts into your engineering process, it’s a miss.

Another Microsoft product, Microsoft Expression Blend (formerly Expression Interactive Designer or “Sparkle”), demonstrates a great way to directly integrate non-engineering contributors (such as illustrators and UI designers) into the Microsoft Visual Studio 2005 development process. The project artifacts they create are full-fledged members of the solution. Engineers and illustrators work in parallel, and the solution is constantly updated.

We need the analog for our analysts. They need to work in their development environment of choice, Microsoft Excel, and we need to have their work immediately accessible as compiled libraries. The UI is irrelevant; it the math and models we want. The UI is just the vehicle for our analysts to develop and test their methods. I don’t want my engineers shoehorning distributed computing code into an analyst’s spreadsheet. I want analysts to compile their spreadsheets and have our engineers reference them as class libraries that can be inserted into our broader high-performance computing infrastructure.

Call it Microsoft Visual Excel.

Imagine if an analyst could declare one or more worksheets as a class, highlight particular cells as class properties, function inputs and return values, with method bodies filled in by Excel worksheet functions and calls to methods or macros written in the .NET programming language of your choice. Imagine a PowerShell-like metaphor, where everything is an object. And now imagine that you can compile the whole thing into an assembly that can be directly referenced by a .NET project. That would be a better building-block for our high-performance computing applications than Excel Services, as it better addresses our engineers and our engineering process.

[continue]

Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.

High-Performance Rocket Science

So why then am I not here telling you today that high-performance computing is as common as regular expressions at my company or that we’ve got a roadmap to parallelize everything that would benefit from it? Why are we here today trying to persuade you to adopt a high-performance computing solution rather than you telling us you’re already using one? If there is truly that much parallelism potential in the financial industry, why aren’t we already parallel?

We could talk about costs and ROI and the IT headaches involved, but I really don’t think that’s it. Architectural changes either bubble up from the engineers or trickle down from management, and I believe that neither right now is pulling the trigger nor is sure that they can. There is fear. There is uncertainty. There is doubt.

A couple years ago, when processor clock speeds and on-die caches were still growing all the time, all we had to do to get better performance was buy a new machine and redeploy our application. Most of the time, we didn’t even have to recompile. But, as Herb Sutter said, “the free lunch is over.” To get performance improvements, we now have to use concurrency, and concurrency requires engineering effort and introduces risk. On the face of it, despite various reports from vendors and pundits, high-performance computing seems hard. Really hard. But no one is totally sure.

Bad Reputation

I think that there is an inaccurate perception that high-performance computing is harder than it actually is. It has a reputation a lot like rocket science. When we say something is hard, “but it’s not rocket science”, we’re implying that rocket science is at the apex of hard. It conjures the image of extraordinary talent, technique, knowledge, know-how, raw genius, all brought to bear, in the face of great risk and pressure, on a task so awesome that it seemed before just a dream. It is a battle between adversity and science. It is Don Quixote in a lab coat. I think that’s the way high-performance computing comes across.

All of us here today hope to persuade you that high-performance computing is now easier and more attainable than ever. And, really, it is. Nonetheless, I am certain that you will leave today with some small trace of disgust and disappointment that it is still harder and more confusing to create a high-performance computing solution than you had hoped. Perhaps the tools target the wrong programming language or platform. Perhaps a solution will require you to upgrade your hardware or install a new operating system. Perhaps it requires a slew of different products and a bunch a duct tape and interoperability artifice just to get “hello, world” working. Regardless of the reason, it’s easy to get dismayed.

Our dream, after all, isn’t to put a man on the moon. We just want instantaneous results. Super-linear scalability. Orders of magnitude performance increases just by configuration, recompilation, or redeployment of our applications to new hardware. We want to spend less, get more, and feel more confident that our systems will shoulder ever more burgeoning loads without buckling under the weight. And we want easy. We want it so easy that we don’t have to worry, as often as we do, about the myriad ways we could be getting it wrong.

Today, we can buy processors with more cores, motherboards with more processors, racks of more and more servers all connected by faster flavors of Ethernet, for less money than a one-year service contract on Cray 90 supercomputer fifteen years ago. Yet we do not realize this dream. For most, a recompile of line-of-business applications won’t exact any performance gains, and most engineering managers aren’t sure they have on staff the parallel and distributing computing expertise to make hardware to break a sweat. They worry that they will make our systems more complex and more expensive without making them faster. They worry that the existing products, tools, and samples will require too much adaptation, too much interpretation, too much experimentation to be either useful or cost-effective.

Too often, they’re right.

[continue]

Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.

The Insatiable Need

Though I’d expect that you would take it on faith that our voracious appetite for computing power is warranted, it’s important to understand why. While other industries also attempt to mine line-of-business data in service of generating profit, in finance that transformation is burdened by two very deep, very strong, opposing forces that, when combined, exert a unique influence on our business:

  1. We are constantly trying to compress the amount of time it takes to get things done.
  2. We are constantly trying to expand the amount of work we can get done in that time.

Our business is a black hole, pulling ever more mass into ever less volume, churning out greater and greater computational density that only ever seems to grow.

You see, unlike the search for extraterrestrial intelligence or the simulation of a nuclear mushroom cloud, when we arrive at a result is every bit as important as the result itself. For many laboratories, eventually is good enough. Any result, either in their lifetimes or before the funding runs out, is a good result. For us, though, a late result is about as useful as no result at all.

As an example, take the Australian bond market. Like most markets, it isn’t open 24 hours a day. It opens at 10am, closes at four. Currently, there is a sixteen hour time difference between Sydney and New York, but come March we’ll lose two hours due to daylights savings time going in different directions in the southern and northern hemispheres. We have no control over this. Of course, it wouldn’t be so bad if it weren’t for the fact that the Aussie bond market only really trades in volume for the first hour or so of the day. I can offer no explanation for it, other than to say perhaps the pubs open at eleven. But whatever the reason, by noon in Sydney the market has become largely illiquid. This means that if you want to trade effectively in Sydney, you need to be at the ready with your shopping list, waiting for the door to open like it were Wal-Mart on the day after Thanksgiving. If you’re late, your trades are shot. And when daylight savings time rolls around in March, you’ll suddenly have to do the whole thing two hours sooner.

The engineers among you may ask, “If you can’t stand to be late, why not be early? Why not calculate the trades, your complete shopping list, in advance and leave yourself some time to spare?”

It’s a good idea, and, in fact, that’s what we normally do, but it is far from optimal. You see, the stimulus of this whole cycle is the data. When we take our market snapshot, we have to clean it, check it, store it, and inject it into the pipeline that starts the whole trade generation process. But once we take a snapshot of the market, all the data that became our input is suddenly disconnected, forever decoupled from the real-time stochastic process we’re trying to make a profit on. We can’t pause the market. We can’t stop the market from telling us that it’s changing. We just have to stop listening. And once we stop listening, we might procrastinate too long making our picks and soon find that the things we wanted to buy are ultimately no longer for sale.

Years ago, I took a marketing class at the NYU Stern School of Business, and while I unfortunately don’t remember the name of the professor, I remember one particular lecture he gave on the subject of fresh flowers. Fresh flowers all begin their lives at the grower, basking in the sun on some rolling green field, in Hawaii or South Africa, growing from seed into budded stalks. Just as the flowers start to bloom, the grower snips them off at the root, killing them right there on the spot. Sure, they’ll keep blooming, just like a ceiling fan will continue turning even after you’ve turned off the power, but let’s not kid ourselves: those flowers are dead. And as they’re packed into cargo containers, rolled onto an airplane, sent off to a stateside distributor, well, they’re still dead. They will make a long journey through airplanes, trucks, warehouses, and supermarket floral departments to finally end up on your dinner table. And by the time they do, these so-called “fresh flowers” are anything but.

Market data works just the same way. From the moment we capture it, it starts to grow stale. The data itself at first looks valid enough, but as time elapses, the world the data implies is becoming fictional. We’re building models, generating trades, running risk controls, making execution schedules, and all the while the market conditions that inspired our desired trades are softly descending into history. The very market conditions that made certain trades favorable will, in short order, no longer be there.

Which leaves between a rock and hard place with events like the Aussie open. While we could generate our trades hours ahead of time, those trades can never be as good as the ones we would generate if we could capture the data and calculate our trades in the instant the market opened. Since we’re not yet at the point where we can construct global investment portfolios in a blink, we have to start ahead of time. We can’t be too early, but we can’t be late. As much as we’d like to be perfectly punctual, the earth stops spinning for no one. Instead, we sacrifice a little accuracy each and every day in order to save us from the big miss every now and again.

The engineers among you again might ask, “Why not get a bigger hammer? Why not make an intense effort to make this process as fast as possible?”

Again, a good idea. We could buy more machines, bigger machines with more memory and more cores, make more and more of our system parallelizable. We could fine-tune our use of data structures and algorithms, reduce bandwidth, make better use of CPU architecture, leverage GPGPU programming, and design a whole host of other optimizations. If our problems were relatively stable and we could focus entirely on performance instead of features, that’s exactly what we would do. We’d throw ourselves into it and could conceivably get the whole thing down to a coffee break.

But here’s the rub: just as we’re coping with less and less time to get things done, we’re constantly trying to squeeze a bunch of new things in. By the time we could get any individual piece nicely optimized, that piece could be changed, replaced, subsumed, or scrapped altogether. Our business will march into new markets and attack new asset classes. We’ll engage new clients with new strings attached. We’ll hit limits and dead-ends and exhaust the money-making potential of strategies that worked great for us in the past. Our analysts will revisit their models and algorithms in order to eke out more market advantages just as others are being chpped away.

That’s just the nature of the business.

In the world of finance, the truth changes constantly, and our responsibilities—and thus our computing problems—grow in many dimensions. If we can’t figure out a way to scale out, we can’t scale.

The Perfect Problem

Fortunately, most of these dimensions are cleanly divisible into largely independent units. For example, the trade pipeline at Bridgewater is already split by department and business goals, with research, portfolio generation, and trade execution all insulated from each other and communicating only through well-defined connection points. Depending on what we’re trying to achieve, we can process any stage of our pipeline one step, one asset class, one account, one market, or one instrument at a time. Even when things get more complicated, such as when we throttle into a desired position over some time horizon in order to minimize transaction costs, taking into consideration the total combined impact we’ll have on each market for each trading session over many days, this is still a proudly parallel calculation that can simply switch its unit of parallelization a few times with a barrier sync at each switch.

And that’s just the so-called “front-office” portion of our business. Our Operations and Client Services departments in the back-end also have highly parallel processes, from securing trade fills to generating real-time and periodic financial reports. Needless to say, if all this doesn’t sound tailor-made for high-performance computing, I don’t know what would. And while Bridgewater is a unique company in a lot of ways, I think most in the financial industry would recognize this fundamentally parallel structure in their own business.

[continue]

Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.

Proudly Parallel

Software engineers like myself, who work in finance and have a keen interest in high-performance computing, are kids in a candy store. Everywhere we look — from front-office to back, from portfolio construction to trade execution, from data capture to quarterly reports — we see algorithms and business processes that are inherently parallel and deeply amenable to high-performance computing. We delight in the fact that our business comes pre-sliced into portfolios, asset classes, markets, securities, trading sessions, and ticks. And even when we get all fancy and use things like Monte Carlo analysis, genetic algorithms, and simulated annealing, we’re still dealing with a rain of independent calculations that rarely rendezvous. They call these types of problems embarrassingly parallel in the business. I can tell you, though, there’s nothing embarrassing about them. Frankly, if you want to do high-performance computing, they’re the types of problems you’d like to have.

But it isn’t all rosy. At my firm, we’ve only parallelized a fraction of what is possible. Despite having an optimal problem set, we are still hampered in our efforts to put all this parallelism potential into production. As it turns out, high-performance computing in finance has some peculiarities that are not entirely addressed by the stable of tools currently available. We are making progress, but only by writing services and infrastructure totally orthogonal to our primary business of investing money.

Historically, high-performance computing has been the stock in trade for research and academic institutions doing “grand challenge”-type problems. Simulations of nuclear weapon explosions. Computational fluid dynamics. Brute-force attacks on cryptographic keys. Analysis of extraterrestrial radio signals. These efforts carried the flag of high-performance computing into the public consciousness and created MPI and grid computing and other foundational technologies and patterns that many of us use today. But just as knowing Latin doesn’t get you cross-town in Mexico City, replicating these classic efforts doesn’t quite deliver the functionality we need in finance today.

In some ways, our problems are simpler. Broadly speaking, there aren’t as many mesh-like algorithms in finance that require constant coherence of shared state. We either operate at less granular levels of parallelism or have truly independent tasks that have no shared state at all. On the flip-side, we deal with heterogeneous data sources, security requirements, and configuration and deployment concerns that, while largely absent in the classic applications, are painfully prominent in ours.

Regardless of the obstacles, though, all of our roadmaps point to high-performance computing. It’s inescapable. More literally than any other industry, finance is all about turning data into information, information into action, and action into profit. And that transformation gorges on computing power. The volume, variety, and relative unreliability of the data that we have to work with is always proliferating. We are constantly evolving more sophisticated financial models and weaving ever more complex and conditional business rules. We are expanding into new markets and dealing in new instruments. We are constantly trying all we can to get a leg up in age of extreme market and information efficiency and find new ways to transmute data into gold.

[continue]

Follow

Get every new post delivered to your Inbox.