Shiro Programming Language: January 2012

Monday, January 9, 2012

CLFATP #1: The Execution Operator

This is the first of my Cool Language Feature of the Arbitrary Time Period pieces, covering a particular snippet of Shiro code that shows off something it does neatly. The snippet goes up on the Google Code Home Page first (a lame reward for those who visit I guess), and once I decide to replace it with a new one, I blog about the old one. How bloody exciting...</deadpan>

So the previous snippet was a short one, but a good one,

def getp(o, propName is String){
return o~('.' + propName)
}

thing = {name: "bob", age: 27}
print getp(thing, "name")

This is about the simplest way to show off one of my favorite Shiro operators, that squiggly thing. It's called the Execution Operator (although I usually just say exec-op) and what it does it insert dynamic stuff into the token stream (ie: your source code) each time it's encountered. It literally lets you make dynamic Shiro and just run it whenever, wherever, however. And because of how it's implemented, you take only a minimal performance hit no matter what context you use it in, so it's basically a "free" operator.

In case you're not following, the getp method above returns an object's property by name. It could have just accessed it using array/tuple notation, but that wouldn't be any fun at all, so instead we build a '.<property>' string and just thrust it into the source code, right after a name. Shiro inserts the .<property> part and is now happy with the syntax, so it returns the value. Easy as that. The crazy thing about the operator is that it can appear literally anywhere, and as long as inserting the value of the result creates valid Shiro, it will run.

There are two different ways you can use this operator in Shiro. If you follow the tilde with a name only, it will insert the value of that variable into the token stream. If, as in the example above, you follow it with a parenthesis, it will evaluate the expression inside the parenthesis, then insert that into the token stream.

How this little gem differs from the ParamOp is worthy of a small chapter in the Quick Start Guide, but it will probably show up in a CLFATP one of these years.

Bugfixes and Platform Builds

A host of long overdue bug fixes has been my most recent work in Shiro. I'm re-building the language test suite that used to belong to Merlin (Shiro's predecessor), translating it to use the new syntax and libraries. It's slow going, but by exercising really bizarre fringe scenarios I'm getting at areas which haven't been tested enough. It's slow and sometimes aggravating work, but it's quite doable as long as you have an intimate-enough understanding of how the parser flows. It's also helping me identify areas that need refactoring -- like the ParseName and ParseRootName conundrum that I resolved a week or so ago. The devil with refactoring is, of course, that it's hard to trust it without unit tests; thus my renewed work on the language tests.

The pre-built binaries for ShiroChan are also not working -- or at least they're x64 only. The shcl and interpreter-only package both work but there's a configuration variable that the version of C# express I'm using doesn't seem to expose that will let me build solid x86 and x64 versions. As soon as I get that ironed out I'll be fleshing out the download page a bit and generally sprucing up the site.

I'm also working on a reference-count garbage collection routine for anonymous functions -- I think I've just about got it figured out in my head, so that'll be a fun little side project for one of these evenings. The symbol table keeps remarkably clean (I've had so many problems with scope early-on with the python scoping syntax in Merlin that it's been pruned and trimmed and worked at for years now), but the function table not so much. The ParseName-ParseRootName merger has also introduced a lingering bug or two. I think they need a new parse-route for handling each tier of a given object chain (Like O.O2.O3.F() or something obtuse like that). Not too hard, and will make both symbol information and handling problems like the implicit-this and cleanup much easier.

I've also started work on "Let's Build an Interpreter", my puny attempt to write something like the wonderful "Let's Build a Compiler" series that got me started. One of my main purposes in developing Shiro was that it be structured in such a way as to lend itself to such a series. One day I hope someone will find my tutorial by accident like how I stumbled on Dr. Crenshaw's and it will spark in them was LBaC sparked in me.

The New Year seems to have come with a burst of energy; lots of fun stuff going on. I hope that the upcoming site update and some new content will help communicate some of the massive, stabilizing changes going on under the hood in Shiro.

Tuesday, January 3, 2012

Why I love hand-coded parsers

A lot of modern compiler/interpreter implementations begin with several tools that generate parsers, scanners, lexers and even sometimes rudimentary code representations based on grammars and even GUI constructs. I've always hand-coded my parsers, usually using recursive descent or Pratt parsing, and just tonight I was reminded of why I prefer it to any automated alternative.

I have an unhealthy love of the ternary operator (that's the ? : construct if you never knew the name). How Shiro has gone so long without it I don't know, but it's been nagging me for a while that it's not in there... it just sort of fits the semantic philosophy of the language. So tonight after a few hours of SWTOR I cracked open Shiro and decided to add the operator. It boiled down, basically, to one function,

protected Token GetTernaryLevelExpression()

{

Token work = GetComparisonCompound();

Token toke;

bool keepRunning = true;

while (keepRunning)

{

toke = PeekToken();

if (toke == null)

{

keepRunning = false;

continue;

}

switch (toke.token)

{

case "?":

PopToken();

if (comb.Not(comb.Not(work)).token == "true")

{

work = GetExpressionValue();

if (!PeekAndDestroy(":"))

Error.ReportError("Ternary operator must include colon");

GetExpressionValue();

}

else

{

GetExpressionValue();

if (!PeekAndDestroy(":"))

Error.ReportError("Ternary operator must include colon");

else

work = GetExpressionValue();

}

break;

default:

keepRunning = false;

return work;

}

return work;

}

A recursive-descent expression parser exactly mirrors the grammar it's parsing, if it's composed right. Adding an operator is as simple as identifying its place in the order of operations (in this case, a whole new step, the very last one processed in an expression) and adding a parse handler for it. You can track out every available operation in Shiro just by tracing out the OOO levels and looking at the operators it handles. Keyword handlers in the Parser.cs file are basically the same. It's the most extensible parser I've ever written, sort of the culmination of all my other language projects in the past. It's also pretty damn fast, although there are areas I know need improvement -- it does what the code tells it to do as soon as it knows how.

I've sort of reached the edge of what I can push in the language right now, I need time to catch the libraries up, make ShiroChan the IDE much better, and begin the never-ending task of optimization. While the core parser runs like lightning and the variable symbol table keeps itself quite clean, there's work to be done around function dispatch and how lambdas are represented in memory. I'm playing around with a LISP built on the DLR (more of a clojure than a normal LISP really) and one of the early decisions that made it incredibly easy and flexible was representing the code and the data as the same object type. That innovation made it into Shiro when I got rid of the hideous VarSym and FuncSym types (which were basically Tokens by another name), but there's a deeper level of integration, namely storing anonymous functions in a more optimal way (as token lists in tokens) and accessing them through the symbol table. The function table, especially as it relates to anonymous functions, is a nightmare right now because they never go out of scope like variables do.

So, changes coming soon: IDE improvments, anonymous function memory use and performance optimizations, and some work on the website (needs more documentation and a binary installer at the very least).

Fortunately, the hard work on this (removing those deprecated classes and using Tokens for everything) is already done. I've just been busy lazing about for the holidays and playing with the Http integration and language-level ORM (well, kind of) to focus on the gritty stuff. Well, new's year resolution is to get this thing shippable, so no more procrastination.