About three weeks ago, I had the opportunity to sit down with Bill Bain of ScaleOut Software and the two Joes, Joe Cleaver and Joe Rubino, from Microsoft’s Financial Services Industry Evangelism team after I gave my presentation on distributed caches at Microsoft’s 6th Annual Financial Services Developer Conference. The two Joes recorded a podcast of our conversation.
Bill, Joe, and Joe, thanks for the opportunity to talk with you guys.
24 Mar 2008
Dataflow is about creating a software architecture that models a problem on the functional relationship between variables rather than on the sequence of steps required to update those variables. It’s about shifting control of evaluation away from code you write toward code written by someone else. It’s about changing the timing of recalculation from recalculate now to recalculate when something has changed. Sure, it’s a distinction that may have more to do with emphasis and point of view than with paradigm, but it can be a liberating distinction for certain problems in financial modeling.
If you work in finance, chances are you may already be expert in today’s preeminent dataflow modeling language: Microsoft Excel. Excel is the undisputed workhorse of financial applications, taught in every business school, run on every desk, wired into the infrastructure of nearly every bank, fund, or exchange in existence. The reason for Excel’s singularity in the black hole of finance is its ability to emancipate modeling from code (and thus developers) and empower analysts and business types alike to create models as interactive documents. Make no mistake — writing workbooks is still very much software development. But Excel’s emphasis on data rather than code, relationships rather than instructions, is something that fits with the work this industry does and the people that do it.
Briefly, when you model in Excel, you specify a cell’s output by filling it with either a constant value or a function. Functions are written in a lightweight language that allows function arguments to be either constant values or references to another cell’s output. In the typical workbook, cells may reference cells that in turn reference other cells, and so on, resulting in an arbitrarily sophisticated model that can span multiple worksheets and workbooks. The point though is that, rather than specifying your model as a sequence of steps that get executed when you say go, here you describe your model’s core data relationships to Excel, and Excel figures out how and when it should be executed.
Example: An Equities Market Simulation
Let’s say that we are writing a simulation for an equities (stock) market. Such a simulation could be used for testing a trading strategy or studying economic scenarios. The market is comprised of many equities, and each equity has many properties, some that change slowly over time (such as ticker symbol or inception date), and some that change frequently (such as last price or volume). Some properties may be functions of other properties of the same equity (such as high, low, or closing price), while others may be functions of properties on other equities (such as with haircuts, derivatives, or baskets).
As a starting point, we introduce a simulation clock. Each time the clock advances, the price of all equities gets updated. To update prices, we use a random walk driven by initial conditions (such as initial price S0, drift r, and volatility σ), a normally distributed random variable z, and a recurrence equation over n intervals of t years:
Note: This equation provides a lognormal random walk [1,2], which means that instead of getting the next price by adding small random price changes to the previous price, we’re multiplying small random percentages against the previous price. This makes sense for things like prices since a) they can’t be negative, and b) the size of any price changes is proportional to the magnitude of the current price. In other words, penny stocks tend to move up and down by fractions of a penny while stock trading at much higher prices tend to move up and down in dollars.
In Excel, you could model this market by plopping the value of the clock into a cell, setting up other cells to contain initial conditions, and then have a slew of other cells initialized with functions that reference the clock and initial conditions cells and that calculate a new price using the above equation for each virtual equity. And then hit F9.
But how would you write this in code? Would you just update the clock and then exhaustively recalculate all of the prices? If you had to incorporate equity derivatives or baskets, would your architecture break? How would you allow non-programming end-users to declaratively design their own simulation markets and the instruments within?
Recently, one of our financial services clients at Lab49 has been trying to solve a similar problem in .NET, and I had been suggesting to them that the problem is analogous to how Microsoft Windows Presentation Foundation (WPF) handles the flow of data from controller to model to view. Dependency properties, which form the basis of data binding in WPF applications, implement a dataflow model similar to Excel, and what I had in mind at first was a solution inspired by WPF. But the more I discussed this analogy with the client, the more I realized that we didn’t just have to use WPF as inspiration; we could actually use WPF.
In this series, I’ll dive further into creating the equities market simulation and look at how to use WPF data binding to create a dataflow implementation. Note that there are several considerations to this approach, and, under the category of just because you can doesn’t mean you should, we’ll evaluate whether or not this method has legs.
[to be continued]
Nope, it’s not from Microsoft Research. It doesn’t hail from Cambridge or Mountainview, nor is it some underbelly technology in Microsoft Vista that only Mark Russinovich and the responsible SDE is aware of. Rather, it’s a new service-oriented application model built on two overlapping technologies: Decentralized Software Services (DSS) and the Concurrency and Coordination Runtime (CCR). It is currently shipping as part of the Microsoft Robotics Studio (more on that later) and is poised to disrupt the way we think about the Windows Communication Framework (WCF) and the way we design, architect, and implement distributed applications. It is a work of genius.
1 Apr 2007
Boo is a new object oriented statically typed programming language for the Common Language Infrastructure with a python inspired syntax and a special focus on language and compiler extensibility.
Depending on how conversant you are with programming language taxomonies, this description may not mean anything to you. Suffice it to say, it’s a nice, concise, legible programming language that has a lot potential power to be embedded or integrated into larger applications. If Boo were a Hollywood actor, it’d be Jeff Goldblum — not particularly verbose, a tad kooky but good-hearted and capable of making a scene pop whether a supporting player or lead.
Note: To put that analogy into perspective, I’d say C# would have to be Brad Pitt (sexy, popular, and omnipresent, but strangely awkward in some roles); Java, Tom Cruise (big in the day, still everywhere, but descending into schizophrenia); C++, Harrison Ford (a stalwart, serious and leathered); and PHP, Steve Buscemi (an ugly mug, but universally seems a satisfying fit).
Many folks forget that programming languages are tools and that some tools are better than others for certain jobs. Sure, I’ve hung a picture hook by banging on it with a flathead screwdriver, and I’ve used a claw hammer as a poor man’s crowbar, but that was purely out of temporal convenience, not because I didn’t know better. Similarly, there are a slew of things that I’d sooner do in PowerShell (or even VBScript!) than belabor in C#, simply because it would take less time, require fewer lines of code, and place more emphasis square on the business problem.