Dataflow via Data Binding, Part 1: Introduction

24 Mar 2008

Dataflow is about creating a software architecture that models a problem on the functional relationship between variables rather than on the sequence of steps required to update those variables. It’s about shifting control of evaluation away from code you write toward code written by someone else. It’s about changing the timing of recalculation from recalculate now to recalculate when something has changed. Sure, it’s a distinction that may have more to do with emphasis and point of view than with paradigm, but it can be a liberating distinction for certain problems in financial modeling.

If you work in finance, chances are you may already be expert in today’s preeminent dataflow modeling language: Microsoft Excel. Excel is the undisputed workhorse of financial applications, taught in every business school, run on every desk, wired into the infrastructure of nearly every bank, fund, or exchange in existence. The reason for Excel’s singularity in the black hole of finance is its ability to emancipate modeling from code (and thus developers) and empower analysts and business types alike to create models as interactive documents. Make no mistake — writing workbooks is still very much software development. But Excel’s emphasis on data rather than code, relationships rather than instructions, is something that fits with the work this industry does and the people that do it.

Briefly, when you model in Excel, you specify a cell’s output by filling it with either a constant value or a function. Functions are written in a lightweight language that allows function arguments to be either constant values or references to another cell’s output. In the typical workbook, cells may reference cells that in turn reference other cells, and so on, resulting in an arbitrarily sophisticated model that can span multiple worksheets and workbooks. The point though is that, rather than specifying your model as a sequence of steps that get executed when you say go, here you describe your model’s core data relationships to Excel, and Excel figures out how and when it should be executed.

Example: An Equities Market Simulation

Let’s say that we are writing a simulation for an equities (stock) market. Such a simulation could be used for testing a trading strategy or studying economic scenarios. The market is comprised of many equities, and each equity has many properties, some that change slowly over time (such as ticker symbol or inception date), and some that change frequently (such as last price or volume). Some properties may be functions of other properties of the same equity (such as high, low, or closing price), while others may be functions of properties on other equities (such as with haircuts, derivatives, or baskets).

As a starting point, we introduce a simulation clock. Each time the clock advances, the price of all equities gets updated. To update prices, we use a random walk driven by initial conditions (such as initial price S0, drift r, and volatility σ), a normally distributed random variable z, and a recurrence equation over n intervals of t years:

$S_{n} = S_{n-1} \cdot \exp(r t - 0.5 \sigma^2 t + \mathbf{z} \sigma \sqrt{t} )$

Note: This equation provides a lognormal random walk [1,2], which means that instead of getting the next price by adding small random price changes to the previous price, we’re multiplying small random percentages against the previous price. This makes sense for things like prices since a) they can’t be negative, and b) the size of any price changes is proportional to the magnitude of the current price. In other words, penny stocks tend to move up and down by fractions of a penny while stock trading at much higher prices tend to move up and down in dollars.

In Excel, you could model this market by plopping the value of the clock into a cell, setting up other cells to contain initial conditions, and then have a slew of other cells initialized with functions that reference the clock and initial conditions cells and that calculate a new price using the above equation for each virtual equity. And then hit F9.

But how would you write this in code? Would you just update the clock and then exhaustively recalculate all of the prices? If you had to incorporate equity derivatives or baskets, would your architecture break? How would you allow non-programming end-users to declaratively design their own simulation markets and the instruments within?

Recently, one of our financial services clients at Lab49 has been trying to solve a similar problem in .NET, and I had been suggesting to them that the problem is analogous to how Microsoft Windows Presentation Foundation (WPF) handles the flow of data from controller to model to view. Dependency properties, which form the basis of data binding in WPF applications, implement a dataflow model similar to Excel, and what I had in mind at first was a solution inspired by WPF. But the more I discussed this analogy with the client, the more I realized that we didn’t just have to use WPF as inspiration; we could actually use WPF.

In this series, I’ll dive further into creating the equities market simulation and look at how to use WPF data binding to create a dataflow implementation. Note that there are several considerations to this approach, and, under the category of just because you can doesn’t mean you should, we’ll evaluate whether or not this method has legs.

[to be continued]

2 Responses to “Dataflow via Data Binding, Part 1: Introduction”

1. […] A Dash of Insight wrote an interesting post today onHere’s a quick excerptIn other words, penny stocks tend to move up and down by fractions of a penny while stock trading at much higher prices tend to move up and down in dollars…. […]

2. Kalani Says:

You might be interested in this recent discussion about Excel and (if you get past the gamasutra bit, into the comments) dataflow languages:

http://lambda-the-ultimate.org/node/2710

One of the guys there (Peter van Roy) wrote a well-regarded book (sometimes called the new “Structure and Interpretation of Computer Programs”) and programming language (Oz) that covers the subject thoroughly (and cleans up a few problems with the Excel model).

From the standpoint of theoretical logic, this kind of problem is a great motivator for “coinductive proof”, whereby we can apply a type system to these things and weed out erroneous programs with races, non-termination, and so on (the pi-calculus being the most famous example of the value of coinduction).