High-Performance Computing in Finance: A Customer’s Perspective [2/7]
12 Jun 2007
Excerpted from a paper I delivered on January 16, 2007 at the Microsoft High-Performance Computing in Financial Services event in New York.
The Insatiable Need
Though I’d expect that you would take it on faith that our voracious appetite for computing power is warranted, it’s important to understand why. While other industries also attempt to mine line-of-business data in service of generating profit, in finance that transformation is burdened by two very deep, very strong, opposing forces that, when combined, exert a unique influence on our business:
- We are constantly trying to compress the amount of time it takes to get things done.
- We are constantly trying to expand the amount of work we can get done in that time.
Our business is a black hole, pulling ever more mass into ever less volume, churning out greater and greater computational density that only ever seems to grow.
You see, unlike the search for extraterrestrial intelligence or the simulation of a nuclear mushroom cloud, when we arrive at a result is every bit as important as the result itself. For many laboratories, eventually is good enough. Any result, either in their lifetimes or before the funding runs out, is a good result. For us, though, a late result is about as useful as no result at all.
As an example, take the Australian bond market. Like most markets, it isn’t open 24 hours a day. It opens at 10am, closes at four. Currently, there is a sixteen hour time difference between Sydney and New York, but come March we’ll lose two hours due to daylights savings time going in different directions in the southern and northern hemispheres. We have no control over this. Of course, it wouldn’t be so bad if it weren’t for the fact that the Aussie bond market only really trades in volume for the first hour or so of the day. I can offer no explanation for it, other than to say perhaps the pubs open at eleven. But whatever the reason, by noon in Sydney the market has become largely illiquid. This means that if you want to trade effectively in Sydney, you need to be at the ready with your shopping list, waiting for the door to open like it were Wal-Mart on the day after Thanksgiving. If you’re late, your trades are shot. And when daylight savings time rolls around in March, you’ll suddenly have to do the whole thing two hours sooner.
The engineers among you may ask, “If you can’t stand to be late, why not be early? Why not calculate the trades, your complete shopping list, in advance and leave yourself some time to spare?”
It’s a good idea, and, in fact, that’s what we normally do, but it is far from optimal. You see, the stimulus of this whole cycle is the data. When we take our market snapshot, we have to clean it, check it, store it, and inject it into the pipeline that starts the whole trade generation process. But once we take a snapshot of the market, all the data that became our input is suddenly disconnected, forever decoupled from the real-time stochastic process we’re trying to make a profit on. We can’t pause the market. We can’t stop the market from telling us that it’s changing. We just have to stop listening. And once we stop listening, we might procrastinate too long making our picks and soon find that the things we wanted to buy are ultimately no longer for sale.
Years ago, I took a marketing class at the NYU Stern School of Business, and while I unfortunately don’t remember the name of the professor, I remember one particular lecture he gave on the subject of fresh flowers. Fresh flowers all begin their lives at the grower, basking in the sun on some rolling green field, in Hawaii or South Africa, growing from seed into budded stalks. Just as the flowers start to bloom, the grower snips them off at the root, killing them right there on the spot. Sure, they’ll keep blooming, just like a ceiling fan will continue turning even after you’ve turned off the power, but let’s not kid ourselves: those flowers are dead. And as they’re packed into cargo containers, rolled onto an airplane, sent off to a stateside distributor, well, they’re still dead. They will make a long journey through airplanes, trucks, warehouses, and supermarket floral departments to finally end up on your dinner table. And by the time they do, these so-called “fresh flowers” are anything but.
Market data works just the same way. From the moment we capture it, it starts to grow stale. The data itself at first looks valid enough, but as time elapses, the world the data implies is becoming fictional. We’re building models, generating trades, running risk controls, making execution schedules, and all the while the market conditions that inspired our desired trades are softly descending into history. The very market conditions that made certain trades favorable will, in short order, no longer be there.
Which leaves between a rock and hard place with events like the Aussie open. While we could generate our trades hours ahead of time, those trades can never be as good as the ones we would generate if we could capture the data and calculate our trades in the instant the market opened. Since we’re not yet at the point where we can construct global investment portfolios in a blink, we have to start ahead of time. We can’t be too early, but we can’t be late. As much as we’d like to be perfectly punctual, the earth stops spinning for no one. Instead, we sacrifice a little accuracy each and every day in order to save us from the big miss every now and again.
The engineers among you again might ask, “Why not get a bigger hammer? Why not make an intense effort to make this process as fast as possible?”
Again, a good idea. We could buy more machines, bigger machines with more memory and more cores, make more and more of our system parallelizable. We could fine-tune our use of data structures and algorithms, reduce bandwidth, make better use of CPU architecture, leverage GPGPU programming, and design a whole host of other optimizations. If our problems were relatively stable and we could focus entirely on performance instead of features, that’s exactly what we would do. We’d throw ourselves into it and could conceivably get the whole thing down to a coffee break.
But here’s the rub: just as we’re coping with less and less time to get things done, we’re constantly trying to squeeze a bunch of new things in. By the time we could get any individual piece nicely optimized, that piece could be changed, replaced, subsumed, or scrapped altogether. Our business will march into new markets and attack new asset classes. We’ll engage new clients with new strings attached. We’ll hit limits and dead-ends and exhaust the money-making potential of strategies that worked great for us in the past. Our analysts will revisit their models and algorithms in order to eke out more market advantages just as others are being chpped away.
That’s just the nature of the business.
In the world of finance, the truth changes constantly, and our responsibilities—and thus our computing problems—grow in many dimensions. If we can’t figure out a way to scale out, we can’t scale.
The Perfect Problem
Fortunately, most of these dimensions are cleanly divisible into largely independent units. For example, the trade pipeline at Bridgewater is already split by department and business goals, with research, portfolio generation, and trade execution all insulated from each other and communicating only through well-defined connection points. Depending on what we’re trying to achieve, we can process any stage of our pipeline one step, one asset class, one account, one market, or one instrument at a time. Even when things get more complicated, such as when we throttle into a desired position over some time horizon in order to minimize transaction costs, taking into consideration the total combined impact we’ll have on each market for each trading session over many days, this is still a proudly parallel calculation that can simply switch its unit of parallelization a few times with a barrier sync at each switch.
And that’s just the so-called “front-office” portion of our business. Our Operations and Client Services departments in the back-end also have highly parallel processes, from securing trade fills to generating real-time and periodic financial reports. Needless to say, if all this doesn’t sound tailor-made for high-performance computing, I don’t know what would. And while Bridgewater is a unique company in a lot of ways, I think most in the financial industry would recognize this fundamentally parallel structure in their own business.