Tuesday, March 24, 2009

f(e) has moved to wordpress

I decided to change things up and jump ship here at blogger. I'll leave the archives up for as long as they let me, but I've already imported all the old posts into the new blog at:

http://rezoant.wordpress.com/.

Sunday, March 22, 2009

Loops and bugkilling

I investigated some ideas for less common loop types from the Wikipedia article on control flow, and decided to integrate some into the design of Fera.

The first is the concept of a loop with a condition somewhere in the middle of it's body. This can already be done in C# (and indeed any language with a while loop) with:


while (true) {
// x x x
if (condition) break;
// x x x
}


In Fera, you can do the same thing with a little less typing and a bit more style:


do {
// x x x
break when (condition);
// x x x
}


This also allows for the programmer to perform infinite loops, though the compiler will emit a warning, letting you know that you've forgot to put in some kind of condition for your loop. You can also use the 'when' construct with 'next' and 'continue' where ever they can be used. The compiler simply reverses the construct, so 'next when (x);' turns into 'if (x) next;'.

Originally I went with 'break if (x);'. Some languages allow if() tails on any instruction, but they are usually not whitespace-agnostic. I looked at this case to decide against it:


if (x)
doSomething ();
else
doSomethingElse () // OOPS! forgot my semicolon!

if (x)
doSomethingTotallyDifferent ();


Forgetting semicolons happens all the time, and this error wouldn't even be acknowledged by the compiler, making this bug very hard to find. Adding the 'when' keyword avoids this ambiguity, and fits in nicely with the do{} loop syntax. I'll be further investigating whether it's a good idea to allow this for all instructions (or most anyway).

There are some specific error conditions for the infinite 'do' loop; for one, the construct:

do;


causes a compile-time error, not because we disfavor pure infinite loops, but rather because it looks retarded. Also, only certain instructions are allowed to be put as the action of an infinite do loop. Curly brace groups, standalone expressions which have known/unknown side effects, conditionals, or loops are allowed. So this construct 'do echo "annoying!";' is thankfully not allowed.

If you want a pure infinite loop, just use 'loop;'. This will always emit a warning, telling you it should never be used in production code (unless you explicitly ignore it, or elevate the warning to be treated like an error).

Another type of loop (which is not implemented yet by my compiler) is do/exits.


var int x = 0;
var int y = 3;
do {
// x x x
} exits {
case x > 10:
// x x x
case y < 0:
// x x x
}


This is syntactically very different from similar constructs in other languages, as it reuses the switch() syntax and logic and tosses it in with the do loop. The result is a pretty unobscured syntax that is easy to remember once you've learned the nuances of both.

Using 'break' in this loop will cause the exits to be evaluated, before control returns outside the loop. Using 'next' will also cause the exits to be evaluated, except that a lack of any exits will result in a new loop iteration. Within the 'exits' block, the 'continue' and 'break' work exactly like they do in switch/branch.

Owing to the fact that we use 'next' for loops and 'continue' for switches, you can use 'next' within an exit block to have the loop return to iteration. This is a unique property among the loop types that Fera offers, something which must be done with if() in C#.

Naturally the test casing marathon continues. Currently there are a total of 90 tests within the test suite, and 71 of them pass. 10 are expected fails, and the other 9 are regressions. I tested many of the earliest work on this project by hand, so now that I've added testcases for them, I discovered many bugs in the Fera grammar and of course the compiler itself.

One of the most perplexing problems was something I thought was happening in the compiler. The problem was happening in the do/while testcases which used a compound condition like so:


var int x = 0;

do {
// x x x
++x;
} while (x > 0 && x < 5);


This particular formulation ensures that the compiler isn't creating the FWhileLoop instruction which represents the loop with PostCondition = false. Doing so would result in the loop behaving like a while() loop, in which case it would never do the loop action.

The problem was that it was running infinitely, and there is a Console.Write(x) within it, which resulted in an infinite sea of digits across the console window.

I quickly figured out it was the compound nature of the condition, so I turned my attention to test 'if3', which tested the if() statement with a similar condition. It happened to be 'x == 3 && y == 4'. The testcase failed because the action of the if() never runs though it should.

Inspecting the CIL showed the comparison code to be something like this:


0 ldloc.0 ; load x to the stack
1 ldc.i4.3 ; load constant zero to the stack
2 dup ; save the top value
3 brfalse 8 ; skip the second condition if false
4 pop ; pop the unused save value
5 ldloc.1 ; load y
6 ldc.i4.4 ; load 4
7 ceq ; compare equality
8 nop ; label


Naturally, there should be a ceq inserted at #2. The way it is now, the first condition would always end up being true. It looked like the FComparison expression operand was not outputting 'ceq'. However upon inspecting that code it was fine and I was able to verify that it in fact was running the branch that emitted the instruction.

Only after much head scratching did I realize that the 'ceq' was there, but was far later in the body then it should be. At a hunch I checked the token listing for the file being compiled and sure enough, it was treating the expression like this:

x == (3 && y == 4)


Which would cause a more complete compiler to yelp but not here. So the tree of expression operands looked like:


FComparison
| FVariableAccess
| FBinaryBoolean
| | FConstant
| | FComparison
| | | FVariableAccess
| | | FConstant


I had to add a new level in the expression grammar called 'CValue' which is comprised of a 'Value' term and an optional set of ComparisonTails. Binary logic operations are handled in the level above it, causing comparisons to be treated as children of the binary logic operators. The tree thus became:


FBinaryBoolean
| FComparison
| | FVariableAccess
| | FConstant
| FComparison
| | FVariableAccess
| | FConstant


Pretty.

Friday, March 20, 2009

My adventure with Unicode identifiers in Grammatica

For people who don't care about multilingual support in their grammars, a token type like this might be sufficient:


IDENTIFIER = <<[A-Za-z][A-Za-z0-9_]*>>


I wanted to expand this to support more than just the basic latin alphabet, hoping it would be as easy as reformulating this expression into character classes, now that Grammatica 1.5 apparently supports unicode regular expressions. First I tried something like:


IDENTIFIER = <<[[:alpha:]][[:alnum:]]*>>


However it appears that Grammatica entirely ignores this type of structure, instead treating it like a character set composed of :s and the letters a, l, p, h, a, etc. So I found out about the \p{Class} formulation for property classes...


IDENTIFIER = <<[\p{L&}][\p{L&}]*>>


The \p formulations like \p{L&} for these weren't working until I consulted Java's own set of property classes, which list Alpha and Alnum, so it turned into:


IDENTIFIER = <<[\p{Alpha}][\p{Alnum}]*>>


This made it past the grammar build, targetting .NET for the tokenizer code. But when the compiler runs, from all appearances I gather that it is using .NET's regular expression library, which does not accept {Alpha} and {Alnum} but prefers the Unicode block names instead like {L&}.

At a prime moment of head scratching I came up with a solution (though it's not as pretty as the formulations above:


IDENTIFIER = <<[\p{Ll}\p{Lu}\p{Lt}][\p{Ll}\p{Lu}\p{Lt}\p{Nd}]*>>


This makes it through grammar compilation and parsing, matching the correct input. It matches any alphabetic letter (upper, lower, title case) as the first character, and any alphanumeric character for the rest.

The inability to use the Java style classes looks like a bug/oversight in the C# port of Grammatica, unless I miss a more elegant way to pull this off?

In any case it works, so that's what I'm using. Grammatica has a very long release cycle and they *just* released version 1.5. It's unlikely there will be a new version for quite awhile. So for anyone who needs Unicode-level identifier tokens, your welcome!

Wednesday, March 18, 2009

More testcases! .... & translations

I revamped Fera's testcase system to differentiate between expected failures/passing. The way it's set up, it treats failed tests which were expected to pass as "regress", and the opposite is marked as "fixed". Example:


(pass) decl-field1 (Declarations: Field (test 1))
(pass) decl-field2 (Declarations: Field (test 2, verified result))
(regress) decl-property1 (Declarations: Property (test 1))
Caught exception 'Object reference not set to an instance of an object.'
(regress) decl-property2 (Declarations: Property (test 2, verified result))
Caught exception 'Object reference not set to an instance of an object.'
(pass) helloworld (Hello, world!)
(pass) echo1 (Echo (test 1))
(regress) echo2 (Echo (test 2: conversion to string))
Unexpected program console output (padded to align to [U]nexp...):
{no output}
(pass) partial-types1 (Partial types)
(pass) variables1 (Variable support (simple definition))
(pass) variables2 (Variable support (initialization))
(pass) variables3 (Variable support (assignment))
(pass) variables4 (Variable support (variable as instance))
(fail) field-get1 (Field access (test 1))
Caught exception 'OpCode ldsfld have null operand'
(fail) field-set1 (Field assignment (test 1))
Compile was not successful (1 errors)
(fail) field-set2 (Field assignment (test 2, verified result))
Compile was not successful (1 errors)
(fail) property-get1 (Property access (test 1))
Compile was not successful (2 errors)
(fail) property-set1 (Property assignment (test 1))
Compile was not successful (1 errors)
(regress) mutate1 (Mutation (++x))
Caught exception 'Operation is not valid due to the current state of the object.'
(pass) methodcall1 (Method call (no parameters, void return))
(pass) methodcall2 (Method call (1 parameter, void return))
(pass) methodcall3 (Method call (1 parameter, throw-away return))
(pass) methodcall4 (Method call (1 parameter, used return))
(pass) methodcall5 (Method call (incorrect parameters, missing 1))
(pass) methodcall6 (Method call (incorrect parameters, 1 extra))
(pass) bool-constants1 (Boolean constants (true))
(pass) bool-constants2 (Boolean constants (check))
(pass) bool-not (Boolean unary operator (!))
(pass) bool-and (Boolean binary operator (&&))
(pass) bool-or (Boolean binary operator (||, test 1))
(pass) bool-or2 (Boolean binary operator (||, test 2))
(pass) bool-or3 (Boolean binary operator (||, test 3))
(pass) invalid-standalone-expr1 (Invalid standalone expressions (x + 3))
(regress) invalid-standalone-expr2 (Invalid standalone expressions (MethodWithoutSideEffects()))
Unexpected program console output (padded to align to [U]nexp...):
success

(pass) imports-structure1 (Imported Namespaces (unqualified type for method return))
(pass) imports-structure1 (Imported Namespaces (unqualified type for method return))
(pass) imports-imperative1 (Imported Namespaces (unqualified type for variable definition))
(pass) switch1 (Switch test 1 (case variable scope))
(pass) switch2 (Switch test 2 (dynamic cases))
(pass) invalid-type-variable (Unknown type (variable))
(pass) missing-variable-initializer (Missing variable initializer (var x;))
(pass) valuetype-instance1 (Passing by-value types as instance parameter (int x; x.ToString ()))
(pass) valuetype-instance2 (Passing by-value types as instance parameter (addressof parameter))
(pass) valuetype-constant-instance1 (Passing by-value constants as instance parameter (38.ToString ()))
(pass) arrays1 (Arrays (array def/construction))
(pass) arrays2 (Arrays (element assignment))
(pass) arrays3 (Arrays (element access))
(pass) arrays4 (Arrays (use of Length))
(fail) variable-params1 (Variable parameters (basic test 1))
Compile was not successful (1 errors)
(fail) generics1 (Generics (test 1))
Compile was not successful (1 errors)


I also implemented translation support for all errors and messages in the Fera compiler (Fc). Here's a sample run of a testcase in my (probably incorrect) Google Translate-based Spanish translation:


C:\fera>fc invalid-standalone-expr2.f /lang:es

invalid-standalone-expr2.f(13,4): Advertencia: 'FeraLanguageTest.Program.MethodWithoutSideEffects' El método no tiene efectos secundarios y devuelve un valor. Este uso del método e
s, probablemente, no.

Compilación éxito (invalid-standalone-expr2.exe): 1 advertencias


Mmm fun. Of course I also implemented a pseudo translation which makes Fera's messages super brief and concise, useful for people who know what they're doing and want the compiler to yap as little as possible.

Tuesday, March 17, 2009

Optimizations & Testcases

I got a little ahead of myself when I decided to start implementing optimizations in Fera... it has code flow graph generation now but it given my lack of access to some of the more efficient algorithms for doing anything beyond that, it makes more sense to get all the language features done and in place (and testcased).

So I did some of that the last couple of days, fixing a critical scoping bug for resolving imported types on declaration entries and implementing arrays (which went so smoothly that I had to pat myself on the back with this blog entry).

I also implemented inspection of method side effects, culminating in this warning which appears whenever you use a side-effect-free method as a standalone expression:

invalid-standalone-expr2.f(13,4): Warning: The method 'FeraLanguageTest.Program.MethodWithoutSideEffects' does not have side effects and returns a value. This usage of the method is probably not intended.

I'm developing the code in a very agile/test-driven way. Here is the Fera testcase suite results for today:

(pass) helloworld (Hello, world!)
(pass) echo1 (Echo (test 1))
(pass) partial-types1 (Partial types)
(pass) variables1 (Variable support (simple definition))
(pass) variables2 (Variable support (initialization))
(pass) variables3 (Variable support (assignment))
(pass) variables4 (Variable support (variable as instance))
(fail) mutate1 (Mutation (++x))
Caught exception 'Operation is not valid due to the current state of the object.'
(pass) bool-constants1 (Boolean constants (true))
(pass) bool-constants2 (Boolean constants (check))
(pass) bool-not (Boolean unary operator (!))
(pass) bool-and (Boolean binary operator (&&))
(pass) bool-or (Boolean binary operator (||, test 1))
(pass) bool-or2 (Boolean binary operator (||, test 2))
(pass) bool-or3 (Boolean binary operator (||, test 3))
(pass) methodcall1 (Method call (no parameters, void return))
(pass) methodcall2 (Method call (1 parameter, void return))
(pass) methodcall3 (Method call (1 parameter, throw-away return))
(pass) methodcall4 (Method call (1 parameter, used return))
(pass) methodcall5 (Method call (incorrect parameters, missing 1))
(pass) methodcall6 (Method call (incorrect parameters, 1 extra))
(pass) invalid-standalone-expr1 (Invalid standalone expressions (x + 3))
(pass) invalid-standalone-expr2 (Invalid standalone expressions (MethodWithoutSideEffects()))
(pass) imports-structure1 (Imported Namespaces (unqualified type for method return))
(pass) imports-structure1 (Imported Namespaces (unqualified type for method return))
(pass) imports-imperative1 (Imported Namespaces (unqualified type for variable definition))
(pass) switch1 (Switch test 1 (case variable scope))
(pass) switch2 (Switch test 2 (dynamic cases))
(pass) invalid-type-variable (Unknown type (variable))
(pass) missing-variable-initializer (Missing variable initializer (var x;))
(pass) valuetype-instance1 (Passing by-value types as instance parameter (int x; x.ToString ()))
(pass) valuetype-instance2 (Passing by-value types as instance parameter (addressof parameter))
(pass) valuetype-constant-instance1 (Passing by-value constants as instance parameter (38.ToString ()))
(pass) arrays1 (Arrays (array def/construction))
(pass) arrays2 (Arrays (element assignment))
(pass) arrays3 (Arrays (element access))
(fail) arrays4 (Arrays (use of Length))
Caught exception 'Unknown error'
(fail) variable-params1 (Variable parameters (basic test 1))
Compile was not successful (1 errors)
(fail) generics1 (Generics (test 1))
Compile was not successful (1 errors)
Press any key to continue . . .

As you see, there's still some bugs in arrays and variable parameters / generics are totally broken. As a nod to test driven development: the mutate testcase fails because of the changes to the Assignable expression model, I wouldn't have known without this great unit test framework!

Monday, March 9, 2009

Fera: a new .NET language

So over spring break I worked tirelessly, even through my severe rhinovirus infection at the end on a compiler for a new programming language that I'm developing which I call Fera.

I'm doing it purely for educational purposes right now, but if it turns out good maybe I'll release it. It's very similar to C#, but it just outright drops most of the historical baggage that it brought along from C.

The best example of this is the 'switch' construct in the language, which is something that pisses me off about C#. In C# switch statements you are required to place a 'break' instruction at the end of each case, with the exception of cases with no instructions, which fallthrough into the next case.

Example:

switch (x) {
case 0:
case 1:
int y = 4;
Console.WriteLine ("x is either 0 or 1");
break; // this MUST be here
case 2:
int y = 6; // compile-time error, 'y' already exists in the current scope
Console.WriteLine ("x is 2");
break; // this TOO must be here
}

Yes, even that last break has to be there, supposedly to make sure a developer never adds a case without adding the 'break' between the old last case and the new case. If you think about the logic of that it's pretty retarded, and really is only there to make it look like Microsoft has improved it. As you see with the multiple definitions of 'y', the whole inside of a switch statement is within the scope it is placed in, meaning in order to define similarly-named variables within multiple case statements, you must place {}s around the cases like so:


switch (x) {
case 0:
case 1: {
int y = 4;
Console.WriteLine ("x is either 0 or 1");
break; // this MUST be here
} case 2: {
int y = 6; // compile-time error, 'y' already exists in the current scope
Console.WriteLine ("x is 2");
break; // this TOO must be here
}
}


And boy, doesn't that look retarded. All the current IDEs get confused too when you do this, changing the indent on lots of stuff while you hammer out the curly braces.

So with that said, here's Fera's interpretation of the switch statement.


switch (x) {
case 0:
case 1:
int y = 4;
Console.WriteLine ("x is either 0 or 1");
case 2:
int y = 6;
Console.WriteLine ("x is 2");
}


As you see, each case statement is given it's own variable scope, and a break at the end of each case is implicit. To fallthrough to the next statement, you use the 'continue' construct like so:


switch (x) {
case 0:
case 1:
int y = 4;
Console.WriteLine ("x is either 0 or 1");
continue;
case 2:
int y = 6;
Console.WriteLine ("x is 0, 1, or 2");
}


The 'break' construct works like it does in C# and other languages, using it causes the instruction pointer to exit the whole switch statement, allowing the next instruction to execute.

Another interesting thing about Fera's switch statement is that it's totally dynamic. The compiler is (well, isn't yet but will be) capable of determining the static nature of the case statements for a given switch statement, so using static cases will allow the compiler to turn it into a constant time branch (CTB). But if it can't do that it will just compile it as a simple cascading set of if/else branches.


int x = 3;
int y = 2;
switch (x) {
case y + 1:
case y + 2:
int y = 4;
Console.WriteLine ("x is 1 or 2 more than y");
case 2:
Console.WriteLine ("x is 2");
break; // this TOO must be here
}


Naturally there are times when you want to ensure that the compiler can use static analysis to improve compilation of your switch. For this purpose, Fera provides the 'branch' construct:


branch (x) {
case 0:
case 1:
Console.WriteLine ("x is either 0 or 1");
case y+1: // this causes a compile-time error
Console.WriteLine ("x is 1 more than y");
}


This doesn't guarantee that the construct will compile to a constant time branch any more than C#'s does, but at least it gives the compiler a chance while making sure the programmer doesn't get any fancy ideas and use dynamic expressions for it's cases.

So there's tons more stuff I'm working to implement, but this is a particularly interesting bit which is already mostly implemented. I think I'll post a few more blogs with info about some of the other features, like contract-based design, aspect-oriented programming, advanced language-level features and more.

I've got a crazy little theory of my own

So the crazies are at it again, but I can't say I didn't expect it.

http://www.newsweek.com/id/169192?digg=1

But I have a crazy theory of my own. What is Obama isn't the Antichrist-- what if the battle WAS the election, and Jesus Christ has already won. Yes that's right, what if Obama was the resurrection of your lord? Is it so hard to believe because he's black? Do you think that God sees the white man as his son when he made black first? Africa is the cradle of human life.

So, assuming Obama is actually Christ, how do we define the armies? Perhaps the Republicans are the armies of the "Antichrist", being the intention of ill will toward the world. After eight years of degrading human rights and terrorizing the world under the guise of it's protection, I think it fits. Maybe the end of your world of suffering is already coming to pass. Maybe Obama will go on to unite us all in preparation for this next thousand years of peace and prosperity.

I'm not religious, but that's a good thing. I'm not clouded by idealogy, only principles. And tell me that hoarding our prosperity in the US is RIGHT and I'll show you a greedy son of a bitch. People in hundreds of countries starve to death everyday, their children die from easily curable diseases, and the best the conservative Christian group will do is send a dollar a day to them.

Jesus had a saying for this: "Give a man a fish and he'll eat for a day, teach a man to fish and he'll eat for a lifetime". Sending money isn't the solution. We have to honestly and seriously help our neighbors as we would like to be helped if it were us in that situation.