There is no longer an obvious, single value to return to the user. For the moment, we're going to concentrate on printing the values of expressions but we'll come back to the subject of retrieving results later.
| statement | -> ( "print" expression | assignment | "run" ) "\n" | |
| assignment | -> name "=" expression | |
| expression | -> term [ ( "+" | "-" ) term ]* | |
| term | -> factor [ ( "*" | "/" ) factor ]* | |
| factor | -> [ "+" | "-" ] ( primary | "(" expression ")" ) | |
| primary | -> number | ( name [ "(" ExpressionList ")" ] ) | |
| ExpressionList | -> expression [ "," expression ]* |
| number | -> [ digit ]+ [ "." [ digit ]* ] [ ( "e" | "E" ) [ "+" | "-" ] [ digit ]+ ] | |
| name | -> letter [ letter | digit ]* |
The largest structure we enter still has to be on one line. Suppose we want to
be able to enter an expression on several lines. At the time that we run out of
input and need to get more, the call sequence might be quite deep. If it is
ParseExpression() : ParseTerm() :
ParseFactor() : ParsePrimary() :
GetNextToken(), each procedure may be holding information about
some pending operation. We cannot return to the caller without losing that
information, and we couldn't recover our state afterwards. We have the choice
of insisting on only complete structures being passed, or of
installing a callback procedure that allows the lexer to get more input. For
the moment, insisting on single line structures will do, but we will need to
do something about this when we add control structures.
Introducing a print statement seems a rather backward step from the point of view of usability. It might be nice to be able to write a mixture of expressions, whose value we want printed, and assignments, whose value we don't want printed. At first sight,
| statement | -> ( expression | assignment ) "\n" | |
| assignment | -> name "=" expression | |
| expression | -> term [ ( "+" | "-" ) term ]* | |
| term | -> factor [ ( "*" | "/" ) factor ]* | |
| factor | -> [ "+" | "-" ] ( primary | "(" expression ")" ) | |
| primary | -> number | name [ "(" ExpressionList ")" ] | |
| ExpressionList | -> expression [ "," expression ]* |
If a statement starts with a number, it is obviously an expression, but if
it starts with a name, we still don't know whether we are looking at an
assignment or an expression. We have to look at the second token to decide.
An assignment will have "=" as the second token, otherwise we have
an expression. Unfortunately, if it is an expression we want the calling
sequence to be ParseExpression() : ParseTerm() :
ParseFactor() : ParsePrimary()
to get the correct action for finding a name in an expression, but the way of
getting there means backtracking two symbols. We therefore have two choices.
We can write the ParseStatement() routine and the lexical analyser
in such a way that they can cope with backtracking multiple symbols, or we can
re-structure the grammar so that we can decide what to do on the basis of one
token.
Possible changes to the grammar include i) having the assignment operator return a value just like any arithmetic operator so that it is part of an expression, and ii) introducing a 'print' keyword. The former would be in keeping with the C way of doing things and would allow statements like
a = b = c = 0 to initialise several variables at
once. When mis-used though, this can lead to obscure and easily missed
assignments which make programs harder to read than they need be. Also, we
don't want to print the value of an expression if it consists of just an
assignment, and coming up with a neat grammar which treats "=" as
an arithmetic operator and which doesn't require an untidy bodge in the
semantics to suppress the printing of the value is not easy. This is not an
option to be undertaken lightly.
Adding an explicit print statement only requires us to replace the first of the grammar rules with
statement -> ( "print" expression | assignment ) "\n"
to get an easy-to-parse grammar that forces us to write relatively clean
and tidy code. We have good confirmation that the statement was supposed to be
an assignment if it has an "=" in the right place, or was supposed to be an
expression value to be printed if it starts with "print". This
sounds like a good option. We could always offer '?' as shorthand for 'print'
as in BASIC.
struct would certainly keep
everything together, but converting the interpreter into a C++ class would be
better. It keeps the notation inside the routines simple as we have direct
access to the members of the structure. The alternative, if you are not using
C++, would be to pass the structure - or a pointer to it - from routine to
routine and to access the members within it. We can keep down the clutter by
using C++.
Properly encapsulated, we can associate particular proglets with particular aspects of our application program.
parse() routine, and the finer detail is best built up in
AddInstr(). This would all use considerably more memory than we
have used up until now, but it still doesn't amount to much really. It all
depends on how accurate you want your messages.
Note that the order of execution does not correspond to a left-right scan of the
source lines because of the infix to postfix conversion we do for expressions.
I won't expand on the current error messages until we can reasonably be expected to write proglets sufficiently complex that we will need to be able to save and restore them.
if statements. Partly as a compromise, and partly because it can
be useful in its own right, I have added a parameter to the run()
routine that allows the caller to set the value of a 'predefined constant' for
the proglet. I have chosen to call this 'time' because that is
what it stood for in a real application that employed this technique. The
proglet can then be executed as a procedure with one parameter.
The user may wish to set the conditions under which the proglet will be run, so
the 'run' statement could be extended to take parameters. These parameters would
be left on the stack and could be retrieved and passed back to the calling
program. For example, run 1, 0.1, 10 might indicate that the
function should be called for values of 1, 1.1, 1.2, ..., 10
There is a problem with this idea though. When 'run' is entered by the user, we are in the 'parse' phase of operation. We don't know what values have been entered, so it seems we can't pass these values back. It is not too difficult to fudge things to get the result we want though. If we make a temporary proglet out of the parameter list, we can do a one-off execution of that proglet and then pop the results off the stack to return them.
The use of commas to separate the parameters is not vital, but it may reduce
the incidence of end-user mistakes about exactly where one expression ends and
the next one starts. It also allows us to re-use the ParseArgList()
routine. At present, that routine expects a ")" at the end of the argument list,
but we could easily pass in a parameter that told it to expect some other token
instead. The calling routine has no trouble knowing which terminator to look
for so this is not a problem.
The precise syntax of the run command doesn't really matter. Use whatever you feel happy with, or think up some other way of terminating the proglet source that suits your application.
run() doesn't return any information to the caller, so
its use is rather limited, but we could pass values back out from
run() or we could retrieve the values of particular variables.
However we return values, run() should always return a status value
to indicate whether the others have any meaning or not. Run-time errors could
invalidate all our results.
Note that any return variables must not be declared as predefined or they will be treated as constants.