Shiro Programming Language: Why I love hand-coded parsers

A lot of modern compiler/interpreter implementations begin with several tools that generate parsers, scanners, lexers and even sometimes rudimentary code representations based on grammars and even GUI constructs. I've always hand-coded my parsers, usually using recursive descent or Pratt parsing, and just tonight I was reminded of why I prefer it to any automated alternative.

I have an unhealthy love of the ternary operator (that's the ? : construct if you never knew the name). How Shiro has gone so long without it I don't know, but it's been nagging me for a while that it's not in there... it just sort of fits the semantic philosophy of the language. So tonight after a few hours of SWTOR I cracked open Shiro and decided to add the operator. It boiled down, basically, to one function,

protected Token GetTernaryLevelExpression()

{

Token work = GetComparisonCompound();

Token toke;

bool keepRunning = true;

while (keepRunning)

{

toke = PeekToken();

if (toke == null)

{

keepRunning = false;

continue;

}

switch (toke.token)

{

case "?":

PopToken();

if (comb.Not(comb.Not(work)).token == "true")

{

work = GetExpressionValue();

if (!PeekAndDestroy(":"))

Error.ReportError("Ternary operator must include colon");

GetExpressionValue();

}

else

{

GetExpressionValue();

if (!PeekAndDestroy(":"))

Error.ReportError("Ternary operator must include colon");

else

work = GetExpressionValue();

}

break;

default:

keepRunning = false;

return work;

}

return work;

}

A recursive-descent expression parser exactly mirrors the grammar it's parsing, if it's composed right. Adding an operator is as simple as identifying its place in the order of operations (in this case, a whole new step, the very last one processed in an expression) and adding a parse handler for it. You can track out every available operation in Shiro just by tracing out the OOO levels and looking at the operators it handles. Keyword handlers in the Parser.cs file are basically the same. It's the most extensible parser I've ever written, sort of the culmination of all my other language projects in the past. It's also pretty damn fast, although there are areas I know need improvement -- it does what the code tells it to do as soon as it knows how.

I've sort of reached the edge of what I can push in the language right now, I need time to catch the libraries up, make ShiroChan the IDE much better, and begin the never-ending task of optimization. While the core parser runs like lightning and the variable symbol table keeps itself quite clean, there's work to be done around function dispatch and how lambdas are represented in memory. I'm playing around with a LISP built on the DLR (more of a clojure than a normal LISP really) and one of the early decisions that made it incredibly easy and flexible was representing the code and the data as the same object type. That innovation made it into Shiro when I got rid of the hideous VarSym and FuncSym types (which were basically Tokens by another name), but there's a deeper level of integration, namely storing anonymous functions in a more optimal way (as token lists in tokens) and accessing them through the symbol table. The function table, especially as it relates to anonymous functions, is a nightmare right now because they never go out of scope like variables do.

So, changes coming soon: IDE improvments, anonymous function memory use and performance optimizations, and some work on the website (needs more documentation and a binary installer at the very least).

Fortunately, the hard work on this (removing those deprecated classes and using Tokens for everything) is already done. I've just been busy lazing about for the holidays and playing with the Http integration and language-level ORM (well, kind of) to focus on the gritty stuff. Well, new's year resolution is to get this thing shippable, so no more procrastination.

Shiro Programming Language

Tuesday, January 3, 2012

Why I love hand-coded parsers

No comments:

Post a Comment