Do the simplest thing possible, but not the simpler.

Albert Einstein



Wanna go
Home to part 2 to part 3

Introduction

When I started college back in the beginning of the nineties, the subject "compiler" was still considered a kind of arcane art, only alchemists could possess such knowledge. I remember when groups of students were reunited due to projects research or extra class work, every one used to speak with a deep sense of respect when the subject was compilers designing and implementation. I remember the impact the first time I started to flip through the pages of the "Dragon Book". The informations inside that book were actually scary.

A lot has changed since then. On these days, just doing some simple search on the web it's possible to find a lot of updated and friendly informations about compiler designing. What was once mysterious is now interesting and a subject which I'm constantly trying to learn more by myself.

It's strange, but there is a sense of joy when programmers see code being executed by tools developed by themselves.

The first time I had the opportunity to read the (ANSI C) source code for a recursive descent expression parser, built to calculate very simple math expressions, it was a hit. I immediately wanted to add new features to it, I started to search for more informations about infix to postfix conversion, lexers, parsers, stacks, recursivity, everything became clearer after I understood that source code details.

I hope the little BASIC interpreter presented here could be interesting to someone, as that expression parser was for me.

There was a time I made a living as a Delphi programmer. I had the opportunity to use a small but powerful library to integrate scripting in Delphi applications, the script language was based on a subset of BASIC. Due to its simplicity it was easy to the final customers, even those without background in programming, to create new formulas and run simple calculations in the main application, just by editing the scripts themselves. We can speak a lot of bad things about BASIC, but not that it isn't simple to learn, it actually worked very well.

That Delphi library is the inspiration for this project. Until the end of this material, I hope to show you a fully functional (in the sense that it will work as expected, not related to the programming paradigm) interpreter to execute a subset of BASIC language, that can be extended and embedded it in C++ applications.

First things first...

Let's talk about the anatomy of our BASIC interpreter. The idea is to have it working according to the diagram below:

Interpreter

Note that the interpreter actually behaves like a compiler. The BASIC source code is not directly interpreted, but converted to a postfix notation and executed by a stack machine. The main advantages of such architecture are:

The BASIC Language

The interpreter is going to handle the unstructured and structured BASIC formats.

The unstructured format is the old one based on numbered lines, such like:

10 REM old fashioned BASIC
20 LET a=1
30 PRINT a
40 LET a=a+1
50 IF a <= 10 THEN GOTO 30
60 PRINT 'That'+chr$(39)+'s all folks'

The structured format is the one recommended. It makes use of function definitions and calls, avoiding the use of jump commands like GOTO:

FUNCTION mulstr$(txt$, n) LOCAL i, s$
    IF n <= 1 THEN RETURN txt$
    s$ = ''
    FOR i=1 TO n
        s$ = s$ + txt$
    NEXT
    RETURN s$
END FUNCTION

REM line below displays AaAaAaAaAa
PRINT mulstr$('Aa', 5)

Variables

The variables in our BASIC language support three different data types:

Strings can be used together with operators to perform string expressions.

Strings can handle multi-line text, a text line can be accessed for reading or writing through the use of brackets:

REM operator "/" concatenates with a newline
txt$ = 'Line-1' / 'Line-2' / 'Line-3'
PRINT txt$[1] : REM prints "Line-2"
PRINT : REM print without parameters shows a newline
txt[1]$ = 'New line'
PRINT txt$[1] : REM shows 'New line'
PRINT
REM now shows the entire text
PRINT txt$

BASIC variables, as well as BASIC commands and functions are not case sensitive.

REM the code below show a simple use of variables
REM "age", "AGE", "aGe", "agE", etc.. are all the same variable
age = 43
print Age

Arrays are not directly supported, but data collection can be implemented through the interpreter extension methods.

Operators

There are numeric and string operators.

Numeric operators

From higher to lower precedence:

OperatorDescription
( ) 
Unary minus
Power
* / mod Multiplication, division, modulo
+ - ?> ?<  Addition, subtraction, higher, lower

Examples:

REM num = 10
num = 10 ?> 5

REM num = 5
num = 10 ?< 5

num = 5 ^ 2
num = math.abs(-1)
num = (3+2)*10

String operators

Applied in string expressions.

All string operators have the same precedence in string expressions.

OperatorDescription
+  Concatenation
-  Tail deletion
/  Concatenation with newline

Examples:

REM s$ = 'HelloWorld'
s$ = 'Hello' + 'World'

REM s$ = 'BAS'
s$ = 'BASIC' - 2

REM s$ = 'Hello' <newline> 'World'
s$ = 'Hello' / 'World'

Commands

Each BASIC command is separated by a newline. Multiple commands can be placed in the same line using a colon.

INPUT 'Enter your name: ', 30, name$
PRINT name$
REM
REM same thing ...
REM
INPUT 'Enter your name: ', 30, name$ : PRINT name$

Keywords

The BASIC interpreter uses the following identifiers as keywords:

IFORTO
ANDCLSENDFORLETMODNOTREM
ELSEGOTONEXTSTEPTHENTRUE
BREAKENDIFFALSEGOSUBINPUTLOCALPRINTUNTILWHILE
ENDFORREPEATRETURN
CONTINUEFUNCTIONENDWHILE
ENDFUNCTION

Comments

Comments are defined using the keyword REM.

All the text after the keyword REM until the end of the line is discarded as a comment.

Functions

Functions can be declared in the BASIC source program or imported from libraries.

Further in this material I will detail the BASIC interpreter source files, including the one where its standard libraries are located, it will be easier to understand the external functions when we examine the source code contents.

Functions can have its return value discarded or could just not return a value at all, working as a subroutine. Parameters, local variables and recursion are all supported.

Syntax:

FUNCTION <name> ( [parameters] )[ LOCAL <local vars> ]
     ...
    [RETURN <result> ]
     ...
END FUNCTION

Function names define its return type, a function name finished with a $ defines a string function, if the name ends with a # it is a pointer function.

As anteriorly mentioned, recursion is fully supported:

FUNCTION factorial(num)
    IF num <= 1 THEN RETURN 1
    RETURN num * factorial(num - 1)
END FUNCTION

The BASIC interpreter allows the use of the '.' character in functions names, this is helpful for example, to classify functions according to some nomenclature defined by the programmer.

FUNCTION customer.name$()
    ...
    ...
    RETURN c_name$
END FUNCTION

FUNCTION customer.phone$()
    ...
    ...
    RETURN c_phone$
END FUNCTION

FUNCTION customer.age()
    ...
    ...
    RETURN c_age
END FUNCTION

The interpreter engine uses a "signature" made by the function name and argument types to keep track of each function declaration. This process allows overloading by just changing the function type, arguments type or total number of arguments.

All declarations below could be present in the same BASIC program:

REM signature = 'overload@n'
FUNCTION overload(num)
    ...
    RETURN ...
END FUNCTION

REM signature = 'overload@n$'
FUNCTION overload(num, str$)
    ...
    RETURN ...
END FUNCTION

REM signature = 'overload$@n'
FUNCTION overload$(num)
    ...
    RETURN ...
END FUNCTION

The &( ) operator and indirect calls to functions

Functions can be called indirectly by using the &( ) operator. The general syntax is:

&( <function signature> [, expression1 [, expression2, ... expression<n>]] )

If the calling function has no parameters just ignore the part after signature. The signature must exist at the dictionary and could be either a user declared function or a function imported from C++.

FUNCTION mulstr$(txt$, num) LOCAL i, s$
  IF num <= 1 THEN RETURN txt$
  s$ = ''
  FOR i=1 to num
    s$ = s$ + txt$
  NEXT
  RETURN s$
END FUNCTION

FUNCTION hello$()
  RETURN 'Hello world.'
END FUNCTION

PRINT 'Calling mulstr$ using its name: ';mulstr$('Aa',5)
PRINT
PRINT 'Indirect call to mulstr$: '; &('mulstr$@$n', 'Bb', 5)
PRINT
PRINT '"Far" functions can also be called: '; &('math.abs@n', -10) * 2
PRINT
PRINT 'Indirect call to a function without parameters: '; &('hello$@')
PRINT
PRINT 'That'+chr$(39)+'s it.'

After compile and execute the program above, the output should be:

Calling mulstr$ using its name: AaAaAaAaAa
Indirect call to mulstr$: BbBbBbBbBb
"Far" functions can also be called: 20
Indirect call to a function without parameters: Hello world.
That's it.

We can use the operator &( ) as part of an expression just like a regular function call, its type in the expression is defined by the called function return type.

The @ operator and functions entry points

The @ operator is used to disclosure the memory entry point of an existing function.

FUNCTION hello$()
  RETURN 'Hello world.'
END FUNCTION

PRINT 'Function hello$() memory entry point (near function): '; @'hello$@'
PRINT
PRINT 'Function abs() memory entry point : (far function): '; @'math.abs@n'

Running the program above should produce a result similar to:

Function hello$() memory entry point (near function): 2
Function abs() memory entry point : (far function): 1.34717e+08

BASIC commands

I/O commands

PRINT

Sends data to the C++ standard output stream represented by std::cout.

PRINT [ expression [, expression] [; expression] ]

Expressions types are numeric or string and must be separated by a semicolon or a comma. If semicolon is used, no spaces are added between the expressions results. If comma is used, four spaces will be added between each expression result. Numbers are always preceded and succeeded by one space.

a=10
PRINT 'a * 2 =';a*2

INPUT

Gets data from the C++ standard input stream represented by std::cin.

INPUT <Question>, <size>, <variable>

Question is a string expression that is shown just before the data input; size is the amount of bytes to read if the input data is in string format; variable is the name of the variable where the input data will be stored.

REM the code below gets a name with no more than 10 characters.
INPUT 'Enter your name: ', 10, name$
PRINT name$
REM the size value is ignored when the input variable is numeric.
INPUT 'Enter your age: ', 0, age
PRINT age

Note: The size parameter is ignored if the storage variable is not of type string, it can hold any valid numeric value without influence in the command result.

Loop commands

FOR - NEXT

FOR <variable> = <expression> TO <expression> [STEP <constant>]
...
...
...
NEXT

FOR <variable> = <expression> TO <expression> [STEP <constant>]
...
...
...
END FOR

FOR i=10 TO 1 STEP -1
    PRINT i
NEXT

REM "NEXT", "END FOR" or even "ENDFOR" are all the same keyword
FOR x=1 TO 10
    FOR y=1 TO 15
        num = num + x*y
    NEXT
END FOR

REPEAT - UNTIL

REPEAT
...
...
...
UNTIL <condition = TRUE>

LET i = 0
REPEAT
    PRINT i
    i = i + 1
UNTIL i = 10

WHILE - END WHILE

WHILE <condition = TRUE>
...
...
...
END WHILE

REM "ENDWHILE" and "END WHILE" are the same keyword
LET i = 1
WHILE i <= 10
    PRINT i
    i = i + 1
END WHILE

BREAK

Immediately terminates a loop. In the example below, the loop stops when the variable num becomes greater than 5

FOR num=1 TO 10
    PRINT num
    IF num > 5 THEN BREAK
NEXT

CONTINUE

Starts a new loop cycle

FOR num=1 TO 10
    PRINT num
    IF num > 5 THEN CONTINUE
    PRINT 'Display only if num <= 5'
NEXT

Conditional commands

IF-THEN-ELSE

The IF .. THEN statement examines the result of a logical expression and defines the flow of the program based on the result of the analysed expression.

IF <logical expression> THEN
    ...
    ...
[ELSE IF <logical expression> THEN]
    ...
    ...
[ELSE]
    ...
    ...
END IF

Compact syntax:

IF <logical expression> THEN <command> : <command> ...

Note: The ELSE IF and ELSE statements are not allowed in the compact syntax.

a=5
IF a < 5 THEN
    PRINT 'a < 5'
ELSE IF a > 5 THEN
    PRINT 'a > 5'
ELSE
    PRINT 'a = 5'
END IF

a=10
IF a = 10 THEN PRINT 10 : PRINT 20 : PRINT 30

Jump commands

GOTO

The so hated GOTO command is implemented because, well... this is BASIC.

The GOTO command immediately jumps to a label. It requires coding in old style BASIC and should be avoided due to the BAD programming practice.

10 REM ******************************
20 REM Program: delusional narcissism
30 REM ******************************
40 LET i = 1
50 INPUT 'Enter your name: ', 60, name$
50 PRINT name$
60 PRINT
70 LET i = i + 1
80 IF i < 10 THEN GOTO 50
90 PRINT 'Aaahhhh...'
100 END

GOSUB - RETURN

Are used to create subroutines in old BASIC programming style. It's far better to create and call a FUNCTION.

10 x = 3 : y = 6
20 GOSUB 100
30 PRINT 'Result:'; r
40 END
100 r = sqrt(x^2 + y^2)
110 RETURN

Extensibility

One interesting aspect of the BASIC interpreter is the ability to extend it by linkage with C++ functions and objects. I will detail this feature in the next part, when we will look through the guts of the interpreter.


The C++ compiler and the BASIC interpreter classes...