Programming | Serial to Parallel to Distributed

Dataflow via Data Binding, Part 1: Introduction

24 Mar 2008

Dataflow is about creating a software architecture that models a problem on the functional relationship between variables rather than on the sequence of steps required to update those variables. It’s about shifting control of evaluation away from code you write toward code written by someone else. It’s about changing the timing of recalculation from recalculate now to recalculate when something has changed. Sure, it’s a distinction that may have more to do with emphasis and point of view than with paradigm, but it can be a liberating distinction for certain problems in financial modeling.

If you work in finance, chances are you may already be expert in today’s preeminent dataflow modeling language: Microsoft Excel. Excel is the undisputed workhorse of financial applications, taught in every business school, run on every desk, wired into the infrastructure of nearly every bank, fund, or exchange in existence. The reason for Excel’s singularity in the black hole of finance is its ability to emancipate modeling from code (and thus developers) and empower analysts and business types alike to create models as interactive documents. Make no mistake — writing workbooks is still very much software development. But Excel’s emphasis on data rather than code, relationships rather than instructions, is something that fits with the work this industry does and the people that do it.

Briefly, when you model in Excel, you specify a cell’s output by filling it with either a constant value or a function. Functions are written in a lightweight language that allows function arguments to be either constant values or references to another cell’s output. In the typical workbook, cells may reference cells that in turn reference other cells, and so on, resulting in an arbitrarily sophisticated model that can span multiple worksheets and workbooks. The point though is that, rather than specifying your model as a sequence of steps that get executed when you say go, here you describe your model’s core data relationships to Excel, and Excel figures out how and when it should be executed.

Example: An Equities Market Simulation

Let’s say that we are writing a simulation for an equities (stock) market. Such a simulation could be used for testing a trading strategy or studying economic scenarios. The market is comprised of many equities, and each equity has many properties, some that change slowly over time (such as ticker symbol or inception date), and some that change frequently (such as last price or volume). Some properties may be functions of other properties of the same equity (such as high, low, or closing price), while others may be functions of properties on other equities (such as with haircuts, derivatives, or baskets).

As a starting point, we introduce a simulation clock. Each time the clock advances, the price of all equities gets updated. To update prices, we use a random walk driven by initial conditions (such as initial price S₀, drift r, and volatility σ), a normally distributed random variable z, and a recurrence equation over n intervals of t years:

$S_{n} = S_{n-1} \cdot \exp(r t - 0.5 \sigma^2 t + \mathbf{z} \sigma \sqrt{t} )$

Note: This equation provides a lognormal random walk [1,2], which means that instead of getting the next price by adding small random price changes to the previous price, we’re multiplying small random percentages against the previous price. This makes sense for things like prices since a) they can’t be negative, and b) the size of any price changes is proportional to the magnitude of the current price. In other words, penny stocks tend to move up and down by fractions of a penny while stock trading at much higher prices tend to move up and down in dollars.

In Excel, you could model this market by plopping the value of the clock into a cell, setting up other cells to contain initial conditions, and then have a slew of other cells initialized with functions that reference the clock and initial conditions cells and that calculate a new price using the above equation for each virtual equity. And then hit F9.

But how would you write this in code? Would you just update the clock and then exhaustively recalculate all of the prices? If you had to incorporate equity derivatives or baskets, would your architecture break? How would you allow non-programming end-users to declaratively design their own simulation markets and the instruments within?

Recently, one of our financial services clients at Lab49 has been trying to solve a similar problem in .NET, and I had been suggesting to them that the problem is analogous to how Microsoft Windows Presentation Foundation (WPF) handles the flow of data from controller to model to view. Dependency properties, which form the basis of data binding in WPF applications, implement a dataflow model similar to Excel, and what I had in mind at first was a solution inspired by WPF. But the more I discussed this analogy with the client, the more I realized that we didn’t just have to use WPF as inspiration; we could actually use WPF.

In this series, I’ll dive further into creating the equities market simulation and look at how to use WPF data binding to create a dataflow implementation. Note that there are several considerations to this approach, and, under the category of just because you can doesn’t mean you should, we’ll evaluate whether or not this method has legs.

[to be continued]

Posted by Marc Jacobs

Filed in .NET, Architecture, Finance, Microsoft, Programming, Software, Technology

2 Comments »

Platform Symphony Only Distributes Executables

30 Apr 2007

Last week, Lab49 showed up in force at the Microsoft Financial Developers Conference. One of our founders and managing directors, Daniel Chait, trotted out some powerful Windows Presentation Foundation (WPF) data visualization demos for the financial services industry.

I got a chance to sneak in and out of several sessions, but overall the conference was ho-hum from a developer’s perspective. Not much new, not much technical. Not even much in the way of vendor swag. I mean, really, how many Microsoft-branded over-the-shoulder messenger bags can one person use? Yawn. Praise be for the vast aquifers of coffee and candy-coated apples.

I did learn at least one new thing, though: Platform Symphony now only supports distributing tasks as executables. According to a guy named Rene, a Platform Computing tech representative roaming the audience answering questions, though the product used to support distributing tasks as libraries, they’ve jettisoned that feature for simplicity and performance sake.

It seems rather archaic to me to have such limited choice in designing your distributed applications. The Digipede Network, for example, allows you to distribute executables, libraries, and even in-memory objects. Being locked into wrapping your logic into an executable (even when all you want to distribute is a function call) reminds me of the coding awkwardness of PVM and MPI.

Is Platform Computing old guard or just getting old?

Posted by Marc Jacobs

Filed in Architecture, Distributed, HPC, Parallel, Programming, Software, Technology

2 Comments »

Analog Concurrency: Boarding a Plane

18 Apr 2007

Last week my family and I returned from a visit with the in-laws and boarded a fully-booked Continental Airlines flight at Sacramento Airport bound for New York-La Guardia. Typical of air travel these days, it took nearly a half-hour to shuffle the nearly 140 passengers on board. There were crying kids, wimpering animals in pet carriers, amorphous lines, bustling mobs, and a cavalcade of renegades who believe boarding by row number can’t possible apply to them.

From a civilian’s perspective, this was just one more example of the headache and disorganization we come to expect while being carted into passenger carriers like Guernseys onto a cattle car. From an engineer’s perspective, though, it’s actually an interesting and tantalizingly thorny optimization problem.

Read the rest of this entry »

Posted by Marc Jacobs

Filed in Architecture, Concurrency, Life, Programming, Technology

15 Comments »

The Most Revolutionary Microsoft Technology You’ve Never Heard Of

12 Apr 2007

Nope, it’s not from Microsoft Research. It doesn’t hail from Cambridge or Mountainview, nor is it some underbelly technology in Microsoft Vista that only Mark Russinovich and the responsible SDE is aware of. Rather, it’s a new service-oriented application model built on two overlapping technologies: Decentralized Software Services (DSS) and the Concurrency and Coordination Runtime (CCR). It is currently shipping as part of the Microsoft Robotics Studio (more on that later) and is poised to disrupt the way we think about the Windows Communication Framework (WCF) and the way we design, architect, and implement distributed applications. It is a work of genius.

Read the rest of this entry »

Posted by Marc Jacobs

Filed in .NET, Architecture, Concurrency, Distributed, Microsoft, Programming, Software, Technology

29 Comments »

Are GPUs and CPUs on a Collision Course?

9 Apr 2007

From ACM Queue:

PeakStream Founder and CTO Matthew Papakipos explains how the PeakStream Virtual Machine provides automatic parallelization of programs written in C/C++ so that developers can focus on their application logic — and not the intricate details of parallelizing the application — and ultimately improve the performance of HPC applications when running on multi-core processors.

Matt is a friend of a friend, and I spoke with him several months ago about his company and their premier product, PeakStream Platform. PeakStream provides enabling technologies that allow developers to more quickly take advantage of general-purpose computation on graphical hardware (also known as GPGPU). Current NVIDIA and ATI graphics cards have incredible power to perform certain types of mathematically intensive calculations in a streaming, data-parallel fashion; however, taking advantage of that power requires some deep programmer sophistication. While efforts such as HLSL, Cg, CUDA, and CTM have made GPGPU programming more accessible, a cursory look at the documentation for these technologies proves that it still isn’t anywhere near easy.

Read the rest of this entry »

Posted by Marc Jacobs

Filed in Architecture, Concurrency, GPGPU, GPU, Hardware, HPC, Parallel, Programming, Software, Technology

4 Comments »