The Script Language
Terminology
This chapter introduces you to basic concepts related to computing and programming generally. Those of you having no former experience in programming should read this chapter with special care, since we will expect in the later chapters that the reader is familiar with the basic computing terminology and vocabulary.
Statements
Each program consists of functions and variables. A function definitions starts with its name and the argument list. The code in the body contains statements that specify the computing operations to be done, and variables store values during the computation. A statement is ended by the semicolon character. Statements can be put into blocks. The outermost block is always the function body. In CADMATIC script language, { and } mark the beginning and the end of a block (sometimes called "compound statements"). A block does not need a semicolon after the closing bracket, but the last statement inside a block must end with a semicolon (unless it is also a block). Blocks are used with the control operators described later.
Examples of statements:
This statement calls a function:
U_MESSAGE("Hello world");
These are assignment statements:
x = compute_value( 3.145, 2, 1); a = 5; s = "this is a string";
The previous statements evaluate the right side of the assign operator ( = ) first and then write the computed value to the variable on the left side.
A block consisting of three statements
{
PI=3.14; radius = 200.; area = PI*radius*radius;
}
The assign operator is not considered to be part of an expression, so that multiple assignments within the same statement are not allowed. Thus practically speaking there are two basic statements: assignment of a computed value, and function call.
Arithmetic expressions
Arithmetic expressions use the arithmetic operators "+-*/^". The "^" operator is exponentiation
The value type of the expression depends on the data types of the operands.
E.g.:
x = I * j;
"I * j" is an arithmetic expression. I and j are the operands of multiplication. "*" is the binary (two operands) multiplication operator. If "I" and "j" are integers then "x" will be an integer.
Data types and variable typing are discussed in more detail later.
Logical expressions
if(answer==1)
This line starts a branching.
if( logical expression )
statement 1
else
statement 2
Where at first the expression is evaluated. The value can be "true" or "false". In script language (like in C) 0 equals to false and everything else to true.
Data types
Computers manipulate data in chunks of binary digits (bits). We seldom are interested in accessing our data as bits. We need to have some useful data types like integers, floating point numbers, and character strings (text); it is up to the compiler to convert these types to bits and bytes.
A binary digit only has two values: 1 and 0. An eight bit chunk is called a byte. A byte can have 256 different values (0–255). A byte is big enough to map characters but too small for most arithmetic operations. In computers an integer number usually occupies four bytes (32 bits). Floating point numbers can be 32 bits or eight bytes (64 bits "long"). The script language floats are 32-bit floats, and they give an accuracy of about six decimals.
Arrays are not an intrinsic part of the script language but are implemented as external functions of the DM module (see the functions DM_VECTOR_* and DM_2D_ARRAY_* ).
The script system does not implement structs or objects, but due to the implicit typing arrays are widely used to implement compound data types with columns and rows.
Applications such as Plant Modeller implement in their kernel their own fixed data objects. These can be accessed via the script extern API of the application. For example, Plant Modeller implements the object type STANDARD COMPONENT, and the script programmer can access its properties via the script extern API functions of Plant Modeller as follows:
STANDARD COMPONENT:
x coordinate |
(float) |
y coordinate |
(float) |
local x-direction |
(float) |
local y-direction |
(float) |
local z-direction |
(float) |
component model |
(string) |
(And so on.)
A typical workflow in a script is that first an extern function is called to return a pointer – a handle to the object data inside the application. The script language provides no operators, other than assign, for handles. Thus getting and setting is always done using script extern functions of the specific application.
Address |
Contents of 4 Byte memory word |
Explanation of contents |
10000 |
123456789 |
Integer number |
10004 |
3.1415678 |
Floating point number |
10008 |
"TEXT" |
4 Characters |
10012 |
" Str" |
4 Characters |
10016 |
"ing\0" |
3 Characters + string terminator |
etc. |
... |
... |
11024 |
10008 |
Memory address (pointer to "TEXT String") |
Implicit typing and arithmetic operators
The script language is built to support implicit typing i.e. the data type of a variable will be defined on the fly depending how the variable was used.
Operations *+- between integers are handled as integers, since an int result is expected. However, division is performed in floating point if the result is less than 8000000. For numbers from 8 million to 4 billion integer representation is more accurate than float so truncated integer division is performed. Another reason for float results from integer divide is backwards compatibility, since in older versions of the script compiler all number constants were floats:
scale = 1/10; /* this would be 0 in "C" language */
If a truncated result is wanted, the result of integer division should be assigned to an int. See the example in next paragraph.
Arithmetic operations between strings and numbers are allowed if the string can be converted to a number. This trick allows conversion of numbers typed in as strings. If the string looks like a float then the other operand will be converted to float also:
Following example shows how the implicit type conversion works with numbers:
/* cnvnum.mac */
#include include/dmutil.h
main()
{
v1 = 3 + 4; v2 = 3 + "4"; v3 = 3 + "4.1";
U_MESSAGE( ITOASCII(v1) ); U_MESSAGE( ITOASCII(v2) ); U_MESSAGE( FTOASCII(v3) );
/* Outputs into the message window: 7 (int) 7 (int) 7.100000 (float) */
}
Explicit typing
Data types can also be defined explicitly. For example, it is good practice in most cases to explicitly define the types of the function arguments. The type of a variable can be explicitly defined in the beginning of the block the variable is used:
/* trunc.mac */
#include include/dmutil.h
main()
{
float fv;
fv = 3.141; trunc( fv );
/* ** will print into the message window the truncated ** float value: 3.000000 */ U_MESSAGE( FTOASCII(fv) );
}
trunc( float f )
{
int i;
i = f f = i;
}
This example shows how a single untyped variable ("u") behaves in different contexts:
#include include/dmutil.h
main()
{
int i; float f;
/* variable "a" is an "undeclared" variable. I.e it gets its data type implicitly at run time depending on how it is used */
a = 7 / 3;
/* value of "a" is 2.333333, a float result from int divide */
U_MESSAGE( FTOASCII(a) );
i = 7 / 3;
/* value of "i" is 2, truncated to int since the data type of "i" was EXPLICITLY set to "int" */
U_MESSAGE( ITOASCII(i));
a = 1000000000.0/3.0;
/* value of "a" is now 333333344.000 (float operation) */
U_MESSAGE( FTOASCII(a) );
a = 1000000000/3;
/* value of "a" is now 333333333 if printed out as an "int", since int op keeps more bits */
U_MESSAGE( ITOASCII(a) );
/* If converted to ascii "as float" loses precision: "a" = 333333344.000 */
U_MESSAGE( FTOASCII(a) );
f = a;
/* value of "f" is now 333333344.000 lost precision */
U_MESSAGE( FTOASCII(f) );
a = "Goodbye!";
/* assigning a string constant to the variable "a" retypes it to "string" */
U_MESSAGE(a);
}
Data type declarations
The data type integer is defined with the reserved word int, and a floating point number with float.
Besides these, a numeric variable can be typed to just number, which can be integer or float. Using number makes sense e.g. when typing a function argument so that it can be either a floating point number or an integer but not a handle or a character string.
The data type character string (row of characters) is declared with the reserved word string ( or STRING).
Besides the former basic types there is the type handle, which accepts coded values from applications but complains if you attempt to do arithmetic with it. Handles can be declared with the reserved word handle. A handle should be thought as a "handle" to some object. Contents of some object can only be accessed from scripts with its unique handle via some extern function. Applications can use different types of handles and the types may be checked at run time. So, using a handle to a standard component as an argument of an extern function which expects to get a handle of a file will produce a run-time error. Often handles are internally pointers i.e. the value is a memory address. The value of this address has only meaning within some application program.
The only operations allowed with handles is assignment, use as a parameter, and test to be equal or not equal to zero. External functions which return a handle on success usually return integer 0 when they fail. This saves one argument which otherwise would be needed to return the status. The reason to use 0 instead e.g.
When dealing with pointers, you have to remember that pointers are variables storing an address of data, not variables storing the data.
E.g. if you create an DB image:
image = DB_FULL_IMAGE(tbl); image2 = image;
"Image2" will be a pointer to the same data chunk as "image", so changing "image2" will change "image" too.
Variables and constants
Variables may hold values of any type unless declared to be a specific type at the beginning of a function block. Such declarations would be used to provide stricter type-checking at run time.
If not typed then a variable is "declared" by simply using it on the left side of an assignment.
Normally variables exist only for the duration of a function and disappear when the function is done. If values need to be retained between calls then variables can be declared outside the function body (usually at the beginning of the file). They are accessible to all the functions that follow it in a script file. The initial value of the global can be set so that it is ready to use even before the first script executes.
After the declaration, it is illegal to declare a another variable or function with the same name. Globals must be typed and must use the keyword "global".
Variable names follow the usual identifier rules: start with an alphabetic character, then use letters or digits or underscore "_". Names are considered distinct only if they differ in the first 14 characters, but there is no limit on the allowed length. Both upper and lower case are allowed and considered distinct.
Variables should be initialized before use. It is a run-time error to use uninitialized variables.
Example of using a global variable.
global int IsInitialized = 0;
DoTrick()
{
if( IsInitialized == 0 ){ initialize_system(); IsInitialized = 1; }
DoSomething();
}
Note: Storage for global variables is kept per executing environment; running a script allocates one executing environment, and globals are stored until the script is terminated and control passed to the application. Globals in scriptlib are stored until the application is terminated.
Operators
Arithmetic operators
Arithmetic operators are:
^ |
exponentiation |
* |
multiplication |
/ |
division |
+ |
addition |
- |
subtraction |
In a series of operations the multiplication and division is performed before addition and subtraction. Parentheses can be used to force earlier evaluation of the part enclosed. A series of multiplies and divides is performed left-to-right, so be careful of errors such as this example:
density = mass / width * length * height;
when what is wanted is:
density = mass / width / length / height;
or readably:
density = mass / (width * length * height);
Logical operators
Logical operators are used (surprise!) in logical expressions. Logical expressions control branching (if) and looping (for, while). A logical expression can have only two values "true" or "false". True and false are mimicked with integer values. In the script language value 0 equals to "false" and everything else "true".
Conditional operators are used to compare values and the result of comparison is one (true) or zero (false). Testing for exact equality with floats is not advised because of round-off errors during computation.
Logical operators are:
! |
not |
<= >= < > |
less or equal, greater or equal, less than, greater than |
== != |
equal, not equal |
& |
and |
| |
or |
All arithmetic operators have precedence over logical operators.
C programmers note: the logical "and" and "or" are single symbols, not double symbols; there are no bit operators.
String operators
String operations are implemented via intrinsic functions, but also the '+' operator is allowed for concatenation.
Note that "12" + "34" = "1234" in this context.
Precedence
Operators of the script language have the same precedence order as in many other programming languages. The idea is that expressions evaluate in "reasonable" order even without parenthesis.
PRECEDENCE |
OPERATORS |
ASSOCIATIVITY |
1 |
( ) |
left to right |
2 |
! |
right to left |
3 |
* / |
left to right |
4 |
+ - |
left to right |
5 |
< <= > >= |
left to right |
6 |
== != |
left to right |
7 |
& |
left to right |
8 |
| |
left to right |
The following script demonstrates the meaning of operators precedence:
/* prec.mac */
#include include/dmutil.h
main()
{
e = 5 + 10 / 5 * 2;
U_MESSAGE( FTOASCII(e) );
/* 9.000000 equals to : 5 + ((10 / 5) * 2) */
e = ! 5 + 10 / 5 * 2;
U_MESSAGE( FTOASCII(e) );
/* 4.000000 equals to : ! (5) + ((10 / 5) * 2) */
e = ! (5 + 10 / 5 * 2);
U_MESSAGE( FTOASCII(e) );
/* 0.000000 ("false") equals to : ! (5 + ((10 / 5) * 2)) */
e = 4 - 5 | 2 - 2;
U_MESSAGE( FTOASCII(e) );
/* 1.000000 equals to : (4 - 5) | (2 - 2); equals to : -1 | 0 equals to : 1 (== "true". Any value except 0 means "true") */
e = 1 | 0 & 0;
U_MESSAGE( FTOASCII(e) );
/* 1.000000 equals to : 1 | (0 & 0) equals to : 1 | 0 equals to : 1 */
}
Control Structures
Looping
Repetitions are accomplished with the either the "while" loop or the "for" loop. Technically speaking, only a single statement can be repeated, but the use of curly brackets allows a group of statements to be treated as a single statement. If the loop body is a single statement it may be used without brackets, but the use of brackets is encouraged for a clear demarcation of which statements are repeated.
while(LogicalExpr) {
statement; statement;
....
}
This performs the statement block until "LogicalExpr " evaluates to 0 (false). If the expression is 0 initially then the block is not executed at all.
The expression may be any arithmetic value, but usually a comparison with a search value or a counter will be used.
concentric() /* draw 6 concentric circles */
{
r = 10;
while (r <= 60) { CIRCLE(0, 0, r); r = r + 10; }
}
The "for" version of the looping operator displays both the test and re-evaluation of the test on the same line (as well as the initial value):
for (i=0;i<100;i=i+1;) { body; }
This is equivalent to:
i = 0;
while (i < 100) { body; i = i + 1; }
The "for" loop is usually used for stepping a counting variable, but any other statements may be used in place of the example statements.
Note: The final expression in the “for” can have a semicolon, but a C-style expression without the terminating semicolon is also accepted.
Nesting of loops is allowed; statements within the loop body may also be loops.
There is also an obsolete operator LOOP(n) which repeats the body "n" times. The value of the counter is inaccessible. Use of this function in new scripts is discouraged, since it will not be supported in the future.
Branching
Conditional evaluation is performed by:
if (e) { statement; statement;
...
}
or
if (e) { statement; statement;
...
}
else { statement; statement;
...
}
which performs the first statement block if the expression "e" is nonzero, or the "else" part if it exists and the expression is 0. A single statement can be used in place of the blocks, but if there are nested ifs an "else" clause always associates with the closest previous "if" part. It is safest to use curly brackets to group nested ifs, especially when some have "else" parts and others don't.
Example drawing rows and columns of circles with rotating radius except in first row:
extern POLY;
extern CIRCLE;
circles()
{
x = 100; /* initial values set before entering loop */
for (i = 0; i < 6; i = i + 1; ) { y = 100;
for (angle = 0; angle < 60; angle = angle + 10;) { CIRCLE( x, y, 50 );
if ( y > 100) { POLY(x, y, x + 50 * COS(angle), y + 50 * SIN(angle)); }
y = y + 160; /* y spacing */
}
x = x + 130; /* next column */
}
}
Functions
One can say that everything in the script language is functions. There cannot be a script without any functions. A script must always define the function main() which is the entry point of the script.
If a function has no parameters empty parentheses are still required. If executing a script and the function main has parameters defined, then the values for the parameters are requested from the user. The curly braces define the beginning and end of the statement list for the function. Statements are terminated by semicolons (the last statement also needs a semicolon).
Parameters are handled as if they were pre-initialized variables. If a known data type is expected then this can be declared in the formal parameter list (see the examples above. Assignments
to parameters are allowed, and if the value in a parameter was sent is a simple variable name, then assigning to it means the caller will get the new value back. This call-by-name operation makes the language behave more like FORTRAN than "C", and allows multiple results to be returned.
Function names and argument names follow the same rules as variable names.
In the next example the first script gets the date from another script with four values returned. Note that the variables must be set before use—this is a precaution in case the called function needs to see some values. The number of arguments to local functions is checked at compile time. At run-time arguments are checked for too many/too few arguments. Extern functions usually check that argument types match to the expected types.
func(arg, arg2, etc)
{
string month; int date; string day; int year;
/* set before using */
month = " "; date = 0; day = " "; year = 0;
/* script calls another script*/
get_date(month, date, day, year);
print(month); print(date); /* etc... */
}
get_date(string month, int date, string day, int year)
{
/*...*/ /* get date from somewhere */
month = "July"; /* example date as constants */ date = 15; day = "Wed"; year = 1992;
}
Any application-supplied functions must be declared in an extern declaration, which must appear in the file earlier than its first use. A good convention is to put all external declarations at the top of the file. The use of "include" files for standard external lists is encouraged to prevent errors (see "Includes" below). Directory /usr/pms410/include stores these extern declaration include files.
If a function call appears and that function has not been declared, it is assumed that it will be found somewhere lower in the same script file. A function should not call itself either directly or indirectly through another function, because a function's variables are allocated only once.
There is no specification about the order of evaluation of the arguments to a function, so using expressions with interdependent side effects is not advised.
Besides arguments the function return value can be used to pass values back to the calling function. Typically the return value indicates the error status. Nearly all extern functions which do not return handles return integer 0 when succeeded and
Next function tries to find a string from a file, it returns logical "true" if found otherwise "false"
FindString( string s )
{
end_of_file = 0; s2 = "";
while( ! end_of_file){ end_of_file = get_str_from_file(s2);
if( s2 == s ){ return(1); /* true */ }
}
return(0); /* false */
}
Intrinsic functions
The following functions are built in to the script executor and are therefore available to all applications.
Arithmetic
SQRT(num)
returns the square root of the float arg
SIN(num) COS(num) TAN(num)
argument is degrees
ASIN(num), ACOS(num), ATAN(num)
result is degrees
EXP(num) , LOG(num)
base "e" 2.71828...
ATAN2(y,x)
returns the phase angle (degrees) to point (x,y) as atan(y/x) without
divide overflow. Thus atan2( sqrt(3), 2 ) = 60.0 and atan2(1,0)
is 90. Note argument order.
Strings
Strings work internally by using indices into a string space. String operations create new strings for all new results, so the original string is unchanged. String arguments to functions may be string variables, string constants, or string-typed parameters.
An untyped variable can be assigned a string value even if it previously held a numerical value.
In the following examples, variables "s", "s2", "patt"and "patt2" are of type string, and "n" is integer.
SUBSTRING(s, n)
tail of string starting at s[n]
For example:
new string = SUBSTRING("abc",1); => new_string = "bc"
HEAD(s,n)
first n chars of s
For example:
new string = HEAD("abc",1); => new_string = "a"
TAIL(s,n)
last n chars of s
For example:
new string = TAIL("abc",1); => new_string = "c"
SEARCH(s, patt)
tail of string starting with pattern
For example:
new string = SEARCH("ab/c","/"); => new_string = "/c"
STRINGTERM(s, patt)
head of string ending before pattern
For example:
new string = STRINGTERM("ab/c","/"); => new_string = "ab"
TRANS(s,patt,patt2)
replace patt char with patt2 char in s. patt and patt2 can contain several chars which are replaced one by one.
For example:
new_string = TRANS("abc","ab","bd"); => new_string = "bdc"
STRLEN(s)
number of chars in s
For example:
nr = STRLEN("abc"); => nr = 3 (int)
APPEND(s, s2)
combined strings. Use rather string assignment: s = s + s2;
Miscellaneous
ISINT(arg), ISFLOAT(arg), ISSTRING(arg)
These are typically used in logical expressions and return 1 if argument is of indicated type, otherwise 0.
TRAP()
TRAP function halts the script executor and pops up the script debugger. See Debugging Scripts.
Include
Lines of script code that are used repeatedly may be stored in a separate file and then "called in" to another file with an "include" statement.
The most common use of this feature is including lists of standard external function declarations. Standard include files are found in a directory named "include" which in turn is under the directory named by environment variable PMS_HOME.
The syntax of an include line is:
#include pathname/filename /* possible comment */
If the pathname starts with "/" then it is an absolute pathname. Otherwise the path is prepended with the value of PMS_HOME as mentioned. It is possible to insert an environment variable to the include path. So normally the include line looks like:
#include include/dbase.h /* standard header for database */
or
#include $PMS_PROJROOT/$PMS_PROJNAME/sde/src/db/db_stuff.h
Include lines are processed prior to scanning the script language, and do not enjoy the same free format as the rest of the language. The include command must all be on one line, and the '#' symbol must be the first character on the line. The file name may be included in quotations marks if desired. A comment may be place after the file name, but must not extend over the end of the line.
The effect of the include is that the named file is inserted into the main file at the point where the include line is. Include files may themselves include other files, up to a maximum nesting of 10 sub-includes.
Note: When executing scripts the system compiles the source if it is newer than the binary. But if the source includes another file then that file is not checked against the binary.
Run-time Errors
Errors in scripts usually do not crash the application (e.g Plant Modeller). However there is one exception:
Important: No reference count is kept inside the script system for memory chunks allocated and referenced via pointers.
A typical fatal error could be:
p1 = get_handle_to_allocated_memory(); p2 = p1; free_allocated_memory( p1 ); use_contents_of_allocated_memory( p2 );
Value of p2 is an address to already freed memory and thus using it can crash not only the script but the application too. Usually all CADMATIC objects are dynamically allocated and the create functions return a pointer (i.e. the object handle) to allocated memory.
Another example would be the A_* routines and their derivatives which let the script programmer to directly manage dynamic memory.
Certain conditions arising during execution may produce messages to the user. These will appear in an application-provided message pane or on your terminal. The line number in the script file is given along with the error message.
For arithmetic errors the script execution continues with any assigned result treated as an uninitialized variable.
Most of these are self-explanatory. Arithmetic underflow is substituted with 0.0.
- variable name used before set
- attempt to assign to constant
- divide by 0
- overflow in name function (result too large or for machine to handle)
- bad argument to name function (domain error, as in SQRT(-5) or ATAN2(0,0))
- unable to compute function name (some other problem with math function)
- incompatible operand types (assigning string to float, etc.)
- call to unbound function name (application did not provide requested function)
- not enough params (called function expected more arguments)
- Out of mem while loading script file
- mysterious internal error