| statement | -> expression "\n" | |
| expression | -> term [ add_op term ]* | |
| term | -> factor [ mult_op factor ]* | |
| factor | -> [ "+" | "-" ] ( primary | "(" expression ")" ) | |
| mult_op | -> "*" | "/" | |
| add_op | -> "+" | "-" | |
| primary | -> number | name |
| number | -> [ digit ]+ [ "." [ digit ]* ] [ ( "e" | "E" ) [ "+" | "-" ] [ digit ]+ ] | |
| name | -> letter [ letter | digit ]* |
setjmp() and longjmp().
setjmp() and longjmp(),
I'm going to go for the C++ option of throwing exceptions.
Everything we are interested in at this stage comes on a single line, so our error recovery strategy can be print an error message, discard the current line and start on the next one. This is simple to implement and perfectly adequate for our purpose.
An error handling routine has been added that prints one or two strings that have been passed to it and then throws an exception. The first message should explain the error and the second shows the token that the error was detected at. The exception type is passed to the error handler so that we retain the option of halting the program if the error is severe enough.
We now have the beginnings of an error handler, but more could be done. We could for instance print out the current line and indicate where on the line the error was detected, and we could make the error messages more flexible, but this will do for now.
-7 is not
just a negative number, but the result of applying the unary minus operator to
the positive number seven. An operator which takes two operands is a binary
operator - e.g. 1+2*3 is interpreted as 1+(2*3) where
the binary operator "*" is applied to the 2 and the
3, and the binary operator "+" is applied to the
1 and the result of the first operation.
Users would be bound to be disappointed and annoyed if they entered something
simple like -7 and it didn't work, so the unary minus is essential.
The "%f" in the sscanf() format string indicates that
we want to read in a floating point number and the "%n" indicates
that we want to know how many characters were scanned to get the number.
sscanf() returns the number of items read from the input. The
TokenLen is not read from the line so it doesn't count towards the
return value and sscanf() should return one item read.
Note that numbers are not signed. They are all treated as positive. To get a negative number, we apply a unary minus operation to a positive number during evaluation. The treatment of a signed literal constant is therefore the same as the treatment of a signed symbolic value or a signed parenthesised expression.
This approach allows us to decide that we are about to read in a number on
the basis of a single character of look-ahead. This sometimes simplifies
writing a lexical analyser although it's not difficult to cope with in this
particular case. If we allow a sign as part of the number we would still
have to read the first digit as well to be sure that we are reading a number
and not a multi-character operator such as -= or --.
I am following the C way of doing things here. I don't think all languages do things this way, but if you are trying to follow what's going on this might have confused you. The user can't tell and doesn't care anyway as long as the right answer comes out at the end.
The lexical grammar shows the structure of a floating point number just in case you have to evaluate these yourself.
ParsePrimary()
consists of one switch statement. The contents of each case in the switch is
entirely action for the item found. For numbers and
constants, the action consists of pushing the value onto the stack. The only
difference is how we retrieve the value to push.
Each entry in the symbol table contains the name of the symbol and its value. To retrieve the value of a symbol, we search through the symbol table for the name and return the value from that record. For such a small symbol table, a linear search through an array is perfectly adequate. With a language that doesn't yet have the ability to assign values to variables, there isn't much scope for having a large symbol table, so a linear search is adequate for the forseeable future.
Solving the problem of when and how to create new symbols can wait until assignments are part of the language.
ParsePrimary() - we push it onto the stack. When
we come across an operator - in ParseExpression(),
ParseTerm() or ParseFactor() - we store it until after
all the values it needs have been built up on the stack. We can then pop the
appropriate operands off the stack, carry out the operation, and push the result
back on the stack for later use.
One minor point to look out for though is the order in which we are popping
things. Note that in ParseTerm() and ParseExpression()
we pop b first then a. When we scan a + b,
we come across a first then b, so a goes
onto the stack first then b. b will
therefore come off the stack first, then a. With addition
and multiplication this doesn't matter, but it does with subtraction and division.