Home
Parsing
Eval

eval3

The extra facilities here consist of
  1. doing something about the errors we detect,
  2. adding a unary plus and minus,
  3. predefining the symbolic constants pi and e,
  4. Evaluating the expression instead of re-printing it.
The grammar is that of eval2 but with unary plus and minus. This is still primitive, but adequate to do something useful, so it is time we actually evaluated the expressions.
Eval3 source.
Eval3 executable.
Syntax grammar 3
statement -> expression "\n"
expression -> term [ add_op term ]*
term -> factor [ mult_op factor ]*
factor -> [ "+" | "-" ] ( primary | "(" expression ")" )
mult_op -> "*" | "/"
add_op -> "+" | "-"
primary -> number | name
Lexical grammar 3
number -> [ digit ]+ [ "." [ digit ]* ] [ ( "e" | "E" ) [ "+" | "-" ] [ digit ]+ ]
name -> letter [ letter | digit ]*

Error handling

Error recovery in C can take three forms:
  1. Passing a status value back to the caller, checking the return status of everything we call and returning immediately if anything went wrong.
  2. Using setjmp() and longjmp().
  3. Trying to guess what was meant and 'doing the right thing'.
The first of these leaves us lots of error handling code that obscures what the real intention of the code is, so I don't really want to do that. The second approximates to throwing an exception in C++ or raising an exception in Ada, but they are generally frowned upon by 'real' (non-C) programmers as they do despicable things that are completely outwith the definition of C itself, so I don't really want to do that. The last of these is both difficult and liable to still be wrong, so we can now completely forget that I ever mentioned it. This leaves us with either continuing our crash-and-burn policy or converting to C++ just to use exceptions. Since I can't imagine anyone having a C compiler that isn't also a C++ compiler these days, and anyone working in a language other than C or C++ won't be able to use setjmp() and longjmp(), I'm going to go for the C++ option of throwing exceptions.

Everything we are interested in at this stage comes on a single line, so our error recovery strategy can be print an error message, discard the current line and start on the next one. This is simple to implement and perfectly adequate for our purpose.

An error handling routine has been added that prints one or two strings that have been passed to it and then throws an exception. The first message should explain the error and the second shows the token that the error was detected at. The exception type is passed to the error handler so that we retain the option of halting the program if the error is severe enough.

We now have the beginnings of an error handler, but more could be done. We could for instance print out the current line and indicate where on the line the error was detected, and we could make the error messages more flexible, but this will do for now.

Unary signs

A unary operator is one which takes only one operand, e.g. -7 is not just a negative number, but the result of applying the unary minus operator to the positive number seven. An operator which takes two operands is a binary operator - e.g. 1+2*3 is interpreted as 1+(2*3) where the binary operator "*" is applied to the 2 and the 3, and the binary operator "+" is applied to the 1 and the result of the first operation.

Users would be bound to be disappointed and annoyed if they entered something simple like -7 and it didn't work, so the unary minus is essential.

The lexer

The lexical analyser has to be made to understand numbers properly, not just as a sequence of characters, and we must also note whether the token we just read was a symbolic or literal value so that we know how to push it onto the evaluation stack.

The "%f" in the sscanf() format string indicates that we want to read in a floating point number and the "%n" indicates that we want to know how many characters were scanned to get the number. sscanf() returns the number of items read from the input. The TokenLen is not read from the line so it doesn't count towards the return value and sscanf() should return one item read.

Note that numbers are not signed. They are all treated as positive. To get a negative number, we apply a unary minus operation to a positive number during evaluation. The treatment of a signed literal constant is therefore the same as the treatment of a signed symbolic value or a signed parenthesised expression.

This approach allows us to decide that we are about to read in a number on the basis of a single character of look-ahead. This sometimes simplifies writing a lexical analyser although it's not difficult to cope with in this particular case. If we allow a sign as part of the number we would still have to read the first digit as well to be sure that we are reading a number and not a multi-character operator such as -= or --.

I am following the C way of doing things here. I don't think all languages do things this way, but if you are trying to follow what's going on this might have confused you. The user can't tell and doesn't care anyway as long as the right answer comes out at the end.

The lexical grammar shows the structure of a floating point number just in case you have to evaluate these yourself.

Symbols

For the restricted case of handling predefined constants, as opposed to variables, handling symbols is easy. The lexer has already recognised the fact that it is a symbol rather than a number, and the parsing part of ParsePrimary() consists of one switch statement. The contents of each case in the switch is entirely action for the item found. For numbers and constants, the action consists of pushing the value onto the stack. The only difference is how we retrieve the value to push.

Each entry in the symbol table contains the name of the symbol and its value. To retrieve the value of a symbol, we search through the symbol table for the name and return the value from that record. For such a small symbol table, a linear search through an array is perfectly adequate. With a language that doesn't yet have the ability to assign values to variables, there isn't much scope for having a large symbol table, so a linear search is adequate for the forseeable future.

Solving the problem of when and how to create new symbols can wait until assignments are part of the language.

Expression evaluation

At last, we have some actions that do something useful. When we come across a value - in ParsePrimary() - we push it onto the stack. When we come across an operator - in ParseExpression(), ParseTerm() or ParseFactor() - we store it until after all the values it needs have been built up on the stack. We can then pop the appropriate operands off the stack, carry out the operation, and push the result back on the stack for later use.

One minor point to look out for though is the order in which we are popping things. Note that in ParseTerm() and ParseExpression() we pop b first then a. When we scan a + b, we come across a first then b, so a goes onto the stack first then b. b will therefore come off the stack first, then a. With addition and multiplication this doesn't matter, but it does with subtraction and division.


Eval
Parsing
Home
Valid HTML 4.01 Any comments or queries, write to me.
Site last updated 4 July, 2004