Saturday, December 09, 2006

Lambda, It's not a Fraternity

I did not study software engineering, or computer science in school.  When I began writing software for a living, I thought this was a liability.  I had a good long conversation over a bottle of Oban that changed my mind.  My good friend Wade has a BS in Computer Science and a MS in Industrial Engineering.  He also has plenty of bona fides in industry.

I expressed my concern about lack of formal training to Wade, then asked what he had learned in his CS studies.  His reply could not have pleased me more.  He told me that he came up with his own language and compiler implemented in Pascal.  I asked if this experience had directly contributed to his career, and he replied that it had not. 

While I believe credentials are important at the beginning of a career, I am more interested in what a job candidate has done than what they have studied.  I also knew then that despite having a MS in Education, I had learned the majority of what I found to be useful knowledge in that field through my own reading and experience.

That is why, dear reader, I was able to march boldly into a career in which I had no formal training and thrive.  I stopped seeing my lack of credentials as a liability.  I was already producing working software at my job, and I was studying like a fiend at home.  This continues to be my practice, and I have never regretted my lack of academic credentials, until now.

I like to follow the news at Reddit.  It seems there are a bunch of alpha-geeks who hang out there, and as a result there are an abundance of links relating to Lisp, Haskell and other "exotic" programming topics.  I read these articles out of curiosity, but I never found an application for the features that were promoted as world changing.

One of the topics that I have found particularly interesting is functional programming.  I don't fully understand it, but I have some idea of how it differs from the imperative programming that I do every day.  Recently some terminology from functional programming has popped up in my "normal" programming studies. 

Microsoft is currently cooking up an extension of it's ADO.NET data access technologies called LINQ.  This is a very interesting development, and I have been following along without actually trying the technologies yet.  One concept that shows up in functional programing and LINQ is that of Lambda Expressions.

This is where my lack of formal training shows up.  I never studied Lambda Calculus, and Joel Spolsky had me convinced that it was a useless pursuit anyway.  Now I have a keen interest in the application of Lambdas in my work.  I will attempt to share what I have gathered on the topic, as it pertains to practical software.  Any correction or clarification of this discussion is welcome and encouraged.

My current take is that Lambdas are a way of expressing value without actually storing the value in memory.  This preserves all meta-data about how the value was derived and how the values used to calculate our current value were derived.

At any point then, we can evaluate the whole thing and output a value, but we don't store the value explicitly.  The first thing I can relate this to is function pointers in C.  A pointer can point to the address of a value in memory, or the address of a function.  That's a rough association, but I need to start somewhere.

Another parallel I see is to dynamic evaluation at runtime.  I program daily in Visual FoxPro, and when I work in C# or VB I pine for the dynamic evaluation available in VFP.  While most loosely typed, or duck typed, languages can perform dynamic, runtime evaluation, VFP has true macros, and I miss them when they are gone.  Could it be that LINQ will allow me to interpret, compile and evaluate expressions (code) on the fly in .NET?

It is not only my study of LINQ that has piqued my interest in lambdas.  I have lately been pondering a problem in my free time.  I do this even though no one has assigned me a task.  When I find an interesting problem, I wonder how I would solve it.  I suspect most software developers do the same. The problem is this: How would I implement a long running (hours or days) process so that changing inputs and/or constraints do not require that I re-run the entire process to get the new results? 

My first reaction is to store all interim calculations in a data cube, so that I can recover as much of the previous work as possible before I have to recalculate.  This is a reasonable approach to a point, but I cannot calculate all possible outputs in my original run, so I will only be able to go back to the point that outputs begin to change, and re-run the process from there, and in some cases this would mean running it from the beginning.

It seems that in this scenario the business rules, or constraints, matter as much as the explicit inputs. In fact, the business rules are inputs to the expression building engine, while the data inputs are passed to the built expressions.

I conceive of this like a query execution path in a SQL engine.  Once the system has determined how best to evaluate the query, much of the work is done.  You can then change the inputs to the query and the server will evaluate it the same way it did before.  My hypothetical, long running, complex process benefits from a similar approach.  Determining how best to evaluate the inputs is half the work, and this is defined by the constraints in the same manner as the table schemas and indexes determine how a SQL engine will evaluate a query.

It now appears imperative when executing a long running process with high CPU requirements that we record how we solved the problem as well as what the solution was.  In this way we could run the process from the start with new inputs and simply re-evaluate the pre-built expressions.  We could also change our constraints and only rebuild the expressions affected by the changes. 

This approach also allows us to solve the problem iteratively.  By setting thresholds on how detailed we want our expressions to be, we could form estimates, and then run the process again building ever more detailed expression trees, asymptotically approaching a precise answer.

This appears to be what functional programming is built on.  This lack of "side effects" also has other practical advantages such as thread safety and high parallelizability.  At least one functional programming advocate is willing to argue that these advantages qualify functional programming as a silver bullet.

 In this context, a silver bullet means that the technology provides an order of magnitude improvement in productivity and or performance.  This is a bold claim, but some of the arguments are persuasive, if not for a full order of magnitude, then for significant improvements. My favorite argument from the above linked essay applies directly to my hypothetical long running process:

The order in which statements are executed, and hence the execution trace of the program, is completely irrelevant. Thus execution order is revealed as a major source of accidental complexity which Brooks mistook for essential complexity, but which is eliminated in pure functional programs.

 If execution order is truly inconsequential, then we can modify our constraints and inputs constantly in order to arrive at the optimal result.  It remains to be seen if I can design a solution of this type in .NET with the new LINQ extensions, or whether I'll need to dive into Lisp, Scheme, Haskell, Erlang et. al.  One thing that is certain, is that I'll find out, and I'll share my understanding here.  Now, if you'll excuse me, I have some reading to do.

++Alan

Comments are closed.