`* * * NOTICE * * *` `* * * WARRANTY * * *` This software program(s) is warranted to perform as documented when used on the speci- fied hardware operating under the specified disk operating system as shown on the accom- panying documentation. If within 90 days of the date of purchase the program is found to be defective due to a bug in the code, the pub- lisher will, upon request, provide a patch to correct the bug or will update the program diskette with a corrected copy within a rea- sonable time period after return of the program diskette to the publisher. If within 90 days of the date of purchase the documentation proves defective due to missing pages, the publisher will provide substitutes for the missing pages upon request. The publisher shall have no liability or responsibility to the purchaser or any other person, company, or entity with respect to any liability, loss, or damage caused or alleged to have been caused by this product, including but not limited to any interruption of service, loss of business and anticipatory profits, or consequential damages resulting from the oper- ation or use of this program. `* * * ATTENTION * * *` This program package is copyrighted with all rights reserved. The distribution and sale of this program is intended for the personal use of the original purchaser only and for use only on the computer system noted herein. Fur- thermore, copying, duplicating, selling, or otherwise distributing this product is ex- pressly forbidden. In accepting this product, the purchaser recognizes and accepts this agreement. The purchaser is entitled to make as many working copies of this disk as is needed for his or her personal use. `MISOSYS,Inc.` `P.O. Box 239` `Sterling, Virginia 22170-0239` `703-450-4181` MC C-Language Compiler Reference Manual Copyright (C) 1985 by MISOSYS, Inc., All rights reserved Reproduction in any manner, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise, without written permission, is prohibited. Published by: `MISOSYS, Inc.` P. O. Box 239 Sterling, Virginia 22170-0239 703-450-4181 `* * * A T T E N T I O N * * *` The MC compiler can be used to generate any software product, commercial or other wise, without payment of any royalties to MISOSYS, with the exception of the following: when MC is used to generate another compiler product, no part of the MC-supplied libraries may be included in the run-time support of the generated compiler. MC compiler: Copyright 1985 R. N. Deglin, All rights reserved. MC Libraries (LIBA/REL, CLIB/REL, MATH/REL, IN/REL): Copyright 1985 R. N. Deglin and MISOSYS, Inc., All rights reserved. MC is a trademark of MISOSYS, Inc. LDOS is a trademark of Logical Systems, Inc. TRS-80 and TRSDOS are trademarks of Tandy Corporation. UNIX is a trademark of Bell Telephone Laboratories. `Table of Contents` Introduction ......................................... 1- 1 MC Provided Files ................................. 1- 2 MC Environment .................................... 1- 6 Standard Input/Output ............................. 1- 7 Standard I/O Redirection .......................... 1- 8 Command Line Arguments ............................ 1- 9 Standard Header Files ............................. 1-10 Function Libraries ................................ 1-11 Runtime Error Control ............................. 1-12 Closing Comments .................................. 1-14 Language Definition .................................. 2- 1 Program Environment - Functions ................... 2- 1 Statements - Simple & Compound .................... 2- 3 Data Representation - Constants ................... 2- 4 Variable Names (Identifiers) ...................... 2- 6 Data Declarations ................................. 2- 7 Scope of Variables & Functions .................... 2-14 Storage Classes ................................... 2-15 Expressions ....................................... 2-18 Unary Operators ................................... 2-22 Binary Operators .................................. 2-25 Statements ........................................ 2-30 IF ............................................. 2-31 SWITCH-CASE .................................... 2-32 WHILE .......................................... 2-33 DO-WHILE ....................................... 2-33 FOR ............................................ 2-34 BREAK .......................................... 2-34 CONTINUE ....................................... 2-35 RETURN ......................................... 2-35 GOTO ........................................... 2-36 MC Preprocessor ................................... 2-37 #define ........................................ 2-38 #undef ......................................... 2-41 #if ............................................ 2-41 #ifdef, #ifndef ................................ 2-42 #else .......................................... 2-42 #endif ......................................... 2-43 #include ....................................... 2-43 #option ........................................ 2-44 #asm, #endasm .................................. 2-45 #line .......................................... 2-46 Program examples: sortsym, dcal ................ 2-47 MC Operators Guide (Running the Compiler) ............ 3- 1 Keyboard Refresher ................................ 3- 1 MC Operation ...................................... 3 -1 Using Job Control Language ........................ 3- 3 Invoking the MCP Preprocessor ..................... 3- 4 Invoking the MC Compiler .......................... 3- 6 Creating an Executable Command File ............... 3- 8 Compile Time Directives ........................... 3- 9 Assembly of the ASM file .......................... 3-11 Linking the Relocatable Object Module ............. 3-11 MC Library ........................................... 4- 1 General ........................................... 4- 1 Functions [abort() to zero()] ..................... 4- 4 `First Edition` Advanced Topics ...................................... 5- 1 Runtime Options and I/O Control ................... 5- 1 Call: DOS SVC Interface ........................... 5- 7 Separate Compilation of Modules ................... 5- 7 Using extern and static ........................... 5- 9 Building and Maintaining Relocatable Libraries .... 5-11 Programs with overlays ............................ 5-13 When things go wrong .............................. 5-16 Runtime error traping ............................. 5-18 Assembly Language Interfacing ..................... 5-22 Program Memory Map ................................ 5-23 MC Identifier Output .............................. 5-24 Runtime variable storage format ................... 5-24 Register Utilization .............................. 5-25 Argument passing .................................. 5-26 Returning a value from a function ................. 5-27 Appendix Error Messages .................................... A- 1 `First Edition` `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` General Congratulations! You have purchased the finest C compiler package available for your machine environment. MC will prove itself to be a valuable investment of your software dollars. Thus, before diving into MC, we suggest that you follow the guidance expressed in the following paragraphs. Read the entire introduction to get an idea of what MC is all about. This will provide an immeasurable insight into the C programming power available to you with the MC compiler. It may also help you understand the content of the remaining chapters. MC requires the use of a macro-assembler which generates Microsoft com- patible relocatable object module files. An assembler is not included with the compiler package. You may have purchased the MISOSYS "Relocating Macro Assembler Advanced Development System" (MRAS) or Microsoft's M-80 assembler. If you have neither, you will not be able to use the compiler package until you obtain a copy of either MRAS or M-80. C source code is prepared using the editor which is included with your assembler package. Make backup copies of the MC distribution diskette to use as a working master. The compiler package is released on a 40-track double density DOS data diskette. This diskette contains all of the files associated with the compiler. We suggest that you make one set of archival backups and store them away in a secure area (safe from dust, dirt, magnetic fields, etc.). Then make a working backup of the distribution diskette. The procedures for making backup copies can be located in the UTILITY section of your DOS user manual under "BACKUP". If you are going to use MC with MRAS, delete "MC/MAC" and "MCMACS/MAC" from the working disk. Conversely, if you are going to use MC with M-80, delete "MC/ASM" and "MCMACS/ASM" from the working disk. If you are usinge MC on a two drive floppy system, you will have to create a DOS system diskette with a maximum of free space. This "working system diskette" can be created by using the DOS PURGE utility on a fresh backup of DOS. You may remove all files except SYS0-SYS4, SYS6, SYS8 (SYS8 can be removed from TRSDOS 6.x), SYS10-SYS12. A 40-track double density minimal system diskette per the above has about 144K free. Copy your assembler, linker, and editor from your assembler disk. Then copy the MC/JCL file and the compiler command files from the working backup to this system diskette. If there is still space left on your system disk, copy some of the other files from the work disk. You will not be able to copy all of the files; in fact, most will still be on the working disk. If your machine has 128K and you are using a RAMdisk, you will find it beneficial to copy the library /REL files to your RAMdisk. Once this is done, remove the files from the working data disk that were copied to the system disk. This should leave work space on the data diskette in your second drive for C source files and the files generated during the programming session. Notice that MC requires a two drive system. In fact, you may find it prudent to use MC on a three drive system - or one using two-sided drives - or even a hard disk environment. If you have gotten this far, continue to read this chapter and discover everything that MC provides you with. `Introduction - General` 1 - 1 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` MC Provided Files MC is a complete C compiler. It adheres to the "standards" expressed by Brian Kernighan and Dennis Ritchie in their book, "The C Programming Lang- uage". MC includes an extensive UNIX System V compatible function library. All you need to generate executable CMD programs is a macro assembler which generates Microsoft compatible relocatable object modules. There are many files on your distribution diskette. In fact, in order to make room for all of the files, the header files have been merged together into a single source archive file. The following C program (which may be on the disk, space per- mitting) will separate the archive into the individual header files: /* unarc.ccc - 11/29/85 */ char aline[81]; main() { while (gets(aline)) { if(!strncmp(aline,"/*%",3) && gets(aline)) { if (!freopen(strcat(aline,":3"),"w",stdout)) exit(-1); else { fputs("\nGenerating: ",stderr); fputs(aline,stderr); } } else puts(aline); } } Once you have established your working compilation and assembly system, the unarc program should be typed into a file named "unarc/ccc". You may change the ":3" which appears in the sixth line of unarc to specify which output drive the header files should be written to. Then compile unarc with the DO command: "DO MC (N=UNARC)". After the command program has been successfully created, invoke unarc with the command: UNARC ", causes substitution of the standard output file specification, the ">>" causes standard output to be appended to the redirected file/device, and the number sign symbol, "#", causes substitution of the standard error file specification. Spaces are permitted between the redirection character and the file specification. It may not be immediately obvious how this feature can be used. Here is an example C program that illustrates the straightforward use of standard I/O redirection. The following program can be used to copy any file to any other file (remember that "file" can be any device or DOS disk file). /* CLONE - copy standard input to standard output */ #include stdio.h int c; main () { while ((c = getchar()) != EOF) putchar(c); } The example program simply copies the standard input to the standard output until end of file is reached. Once this program is compiled, assem- bled, and linked, it can be used to copy any file to any other. For example: CLONE *PR lets the user type to the system printer. If disk file copying is needed, the command: CLONE OUTFILE/BAK:2 will copy the file "INFILE/ASM:1" to the file "OUTFILE/BAK:2". If the user wishes to have a printed log of any error messages that a program puts out, use something like : MC TESTLIB #*PR Any messages that MC outputs to the standard error file will be re-directed to the printer device in lieu of the console display. `Introduction - Standard I/O Redirection` 1 - 8 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Command Line Arguments When a C program is invoked, the command line is parsed into a list of arguments. A single argument is represented by a continuous string of non-white space characters surrounded by white space. At a minimum, a command invocation will have one argument - the program name itself. The list of arguments is passed to the executing program through the "argc" and "argv" arguments of main(). "Argc" is an integer which contains the number of arguments while "argv" is an array of elements, each element of which is a pointer to a character string. These arguments will usually be declared as: main(argc,argv) int argc; char *argv[]; providing your program with a method of recovering various data entered on the command line by the invoker. Command line arguments to a C program may be enclosed in single or double quotes. This allows the inclusion of special characters and whitespace in an argument. The quotes are stripped before the argument is passed to main(). To include a quote in an argument, precede it with a backslash "\". To include a backslash, use a double backslash "\\". For example, in a command line such as: PROCESS INPUT/DAT:3 TEMPY/TXT:5 +O=:7 -L "+C=This is a message" six arguments will be passed to main() and "argc" will be equal to 6. If main() was the program: main(argc,argv) { while (argc--) puts(*++argv); } it would output the strings, "PROCESS", "INPUT/DAT:3", "TEMPY/TXT:5", "+O=:7", "-L", and "+C=This is a message". Note that the latter argument is a text string having imbedded blanks; this is permitted when the command line argument is enclosed within quotes. Any redirection specifications will be processed before the command line arguments and will not appear in the argument list. `Introduction - Command Line Arguments` 1 - 9 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Standard Header Files Standard header files are files which contain definitions peculiar to a system. They usually take the form of "#define" statements and "extern" statements within the header file. In order to use certain libraries, a corresponding header file should be included (using the "#include" state- ment). The file extension of "/H" is used for MC header files to be consis- tent across versions of UNIX and other systems sporting C compilers. A program to be compiled and linked with MC should usually have the file "STDIO/H" included to compile properly. STDIO/H also defines various system dependent parameters, such as end of file (EOF) and end of line (EOL). The FILE POINTERS , , and are addresses in the standard library which do not need to be defined before use; however, the FILE DESCRIPTORS , , and are defined in the header file. MC includes many header files which are standard under UNIX. These files contain symbolic definitions of constants used in various functions in the stanard library. It is absolutely essential that you use the symbols defined in the header files when noted in the documentation for the functions. Do NOT extract the symbolic constant's value and use the number in your program. Numbers are not necessarily portable across C installations; however, the symbolic NAMES are a part of the AT&T definition and are portable! Thus, by using the symbolic names defined in the appropriate header files, you will minimize any conflict in portability, not to mention compatibility with future releases of this compiler package. Any header file required by a function is documented with the function needing it. `Introduction - Standard Header Files` 1 - 10 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Function Libraries Commonly used functions are collected into FUNCTION LIBRARIES. The functions in a library can be used by the programmer without the need to rewrite, recompile, or reassemble the functions needed. Once a C program has been compiled and assembled, it can then be combined during the link phase with the functions it requires. Only those functions necessary for the execution of the program are linked to the compiled program. Certain functions required by many programs are included in a special library called the STANDARD LIBRARY. The standard library is the common denominator among all C language installations. Programs written using functions in the standard library are easily transported to any other compu- ter supporting a C language system with the standard library implemented. The most important aspect of the standard library is that it allows the details of each system's peculiar operating environment to be hidden from the pro- grammer's view. The standard library provides the functions for input/output, memory allocation, and character set manipulations. What is typical in UNIX installations is to have the standard functions "callable" from C in a library named "LIBC". In addition, a collection of subroutines used by the compiled C program to perform basic operations but not directly callable from C is contained in a library named "LIBA". MC fol- lows these standards. MC also incorporates the high-level math functions into a "MATH" library as is also found under UNIX systems. Users can also create their own collections of often-used functions that can be used in the same manner as the standard library. These USER LIBRARIES reduce the programming time, compilation time, assembly time, and program complexity necessary in creating new programs. Functions, once defined, written, and tested, can be added to the user library and need only be referred to by name in later programs. The linking process brings the func- tions into subsequent programs without the need to recompile and reassemble. Relocatable object module libraries are created and maintained using the MLIB librarian. This facility is included with the MRAS assembler package or is available separately for M-80 users. You can even build a user library with the APPEND command in the DOS by using the (STRIP) parameter. More on library building is included in Chapter 5, "Advanced Topics". Special purpose libraries may also be created for use in particular types of applications. For instance, the functions specific to the hardware are provided with your C package are in the special purpose library, IN/REL. This is an example of how the C language avoids the trap of non-standard extensions being included within the language. `Introduction - Function Libraries` 1 - 11 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Runtime Error Control MC provides certain facilities for the detection and control of four types of runtime errors. These errors may be classified as DOS I/O errors, C environment errors, low-level floating point errors, and high-level floating point mathematical errors. This section will describe the function and purpose of the control the programmer has over these errors; however, the mechanics of implementing the control will be discussed in Chapter 5, "Advanced Topics". To begin with, the standards of UNIX System V dictate a protocol for most functions to return a value indicating that an error ocurred. Chapter 4, "Function Libraries", documents the error return conditions for each function which supports an error return code. It is up to the programmer to provide appropriate code to detect and act on that error return code. The DOS I/O errors are characterized by problems in accessing files, reading from files, or writing to files. When such an error is detected in the MC I/O package, the DOS I/O error is stored in the File Control Area assigned to the file stream, provided the file has been successfully opened. This error number may be obtained through the ferror() function. Next, the error is passed to a routine in the I/O package which will optionally display the runtime error. The option() function provided in the standard library can be used to control the behavior of this I/O error display - or suppress it entirely. Next, the DOS I/O error will be translated to an appropriate UNIX error number and stored in the global error variable, "errno". Finally, the error indication will be reported back through the highest library function invoked to return an indication of error to your program. There are other types of errors which could be experienced. A memory allocation request could be unsatisfied because insufficient free memory was available. A request to obtain status on a file stream could be unsatisfied because the request was associated with a "character special device" (i.e. *DO) rather than a file. These types of errors have an appropriate UNIX error number assigned. Thus, any of these errors will store the designated UNIX error number in the global error variable, "errno". The error indication will be reported back through the highest library function invoked to return an indication of error to your program. On larger computers, floating point errors such as overflow and underflow are usually trapped by hardware. When detected, they generate a hardware interrupt so software routines can be notified to take whatever action is desired. Since all of the floating point routines provided in MC are implemented in software, your computer cannot generate a hardware interrupt when a floating point error ocurrs. When MC does recognize a low-level floating point error, it stores an error code in "errno", the global error variable. In addition, MC provides a floating point vector, "_fltvec", which is called whenever such an error has been detected. This vector can be altered by your program to point to your function which takes whatever action you decide to implement. In its normal state, "_fltvec" does nothing but return. The fourth facility for trapping errors is provided by "matherr()". High-level floating point errors are characterized by such things as trying to take the square root of a negative number, trying to take the log of a `Introduction - Runtime Error Control` 1 - 12 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` negative or zero number, or trying to take the Arc sine of a value which is not in the range extending from minus one through plus one. UNIX System V documents a floating point exception handler, called matherr(), which is called when such high-level mathematical errors are encountered. An exception structure contains information pertinant to the detected error at the time that matherr() is invoked. The global error variable, "errno", is also loaded with the appropriate error number. In addition, a uniform error message is optionally written to standard error. Your program may provide its own matherr() function to handle the detected error as you see fit. Illustrations of this facility appear in the documentation of those functions which support this high-level floating point exception handling. It should have been evident from this discussion that all error types make use of the global error variable, "errno". This variable is a UNIX System V feature. All of the error numbers are represented by symbolic values and appear in the "errno" header file. The "math" header file defines the exception structure and the symbolic definitions of math errors. These header files should be "#include'd" with your source program as appropriate in order to make use of the symbolic definitions. `Introduction - Runtime Error Control` 1 - 13 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Closing Comments C encourages the use of structured programming methods. Unless one uses the "goto" statement heavily, C practically demands a structured approach to program construction. This is not to say that writing programs in C will automatically make you a good, structured programmer. This is a skill that is developed by learning and applying the basics. Some understanding of structured design concepts is necessary in order to effectively use C. Probably the first frustrating thing that novice C programmers will encounter, especially if their experience is limited to BASIC and assembly language, is the discouragement of the use of "goto". Kernighan and Ritchie, in THE C PROGRAMMING LANGUAGE, state that the "goto" is never necessary, and in practice it is almost always easy to write code without it. The concept to understand is that the "goto's" are hidden within the program statements. C provides, in a coherent, understandable form, the program constructs that you have been building out of "goto's". Last but not least, many texts are available that can be part of your library. The following list is not to be considered all inclusive but lists those texts (alphabetically) that we have had at our disposal. THE BIG RED BOOK OF C by Kevin Sullivan (published by Sigma Press) THE C PRIMER by Les Hancock and Morris Krieger (published by BYTE Books). C PRIMER PLUS by Michael Waite, Stephen Prata, and Donald Martin (published by Howard W. Sams & Co., Inc.) C PROGRAMMER'S LIBRARY by Dr. Jack Purdum, Timothy C. Leslie, and Alan L. Stegemoller (published by Que Corp.). C PROGRAMMING GUIDE by Dr. Jack Purdum (published by Que Corp.). THE C PROGRAMMING LANGUAGE by Brian W. Kernighan and Dennis M. Ritchie (pub- lished by Prentice-Hall). We will refer to this book throughout this manual by the abbreviation, "K&R", for Kernighan and Ritchie. THE C PUZZLE BOOK by Alan R. Feuer (published by Prentice-Hall). INTRODUCTION TO C by Paul M. Chirlian (published by MATRIX) LEARNING TO PROGRAM IN C by Thomas Plum (published by PLUM HALL). SYSTEM V INTERFACE DEFINITION (published by AT&T) THE UNIX PROGRAMMER'S MANUAL Volume I and Volume II by Bell Telephone Laboratories, Inc. (published by Holt, Rinehart and Winston). `Introduction - Closing Comments` 1 - 14 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Program Environment - Functions The C language is, in a word, functional. The basic unit of program construction when using C is the function. Every C program is a collection of functions. Each function is a collection of statements that work together to achieve (hopefully) a useful, well-defined, purpose. Each function can have information passed to it when it is invoked ("called"). The elements of information passed to the called function are denoted as arguments. In C, arguments are copied onto the stack. The function can then access and use the "local" (known only to the called function) arguments, leaving the original copy of the arguments unchanged. Each argu- ment is defined at the start of the function. Functions also return values to the functions that call them. In C this value can be an integer number, a long integer, a float, a double, or a pointer. The value returned can be compared to, placed in a variable, etc. Functions can appear in an arithmetic expression anywhere that a constant can. Here is an example of a function: square(num) int num; { return num * num ; } The function, square(), returns the square of a number; in other words, the argument, "num", is multiplied by itself and the result is returned. Arguments are listed in parentheses after the name of the function, separated by commas. These arguments must be passed by the calling function in the same order as they appear in this list. The BODY of the function is the group of executable statements that are within the braces "{" and "}". Actually, the grouping of statements in be- tween braces denotes a special kind of statement called the COMPOUND state- ment. The compound statement is fully explained in the section on C language statements. Every C program has a special function called "main" which is always the entry point to the program. When referencing a function within this narra- tive, we will put "()" after the name to identify it as a function. This is close to the way it looks in a C program. The function, main(), calls other functions, which in turn call other functions, etc... Thus, each program is a hierarchical structure of functions, with main() at the top of the hierarchy. The DOS command line which invokes the C program is passed to the func- tion main() using two parameters, "argc" and "argv". One C program can invoke another program by using the system() function. When the called program fin- ishes, a special function, exit(), is used to return a value to the calling program. Programs can call other programs, passing any information using "argc" and "argv" command line arguments as explained in Chapter 1, "Intro- duction". In a way, each program appears as a function to other C programs and to the DOS. `Language Definition - Program Environment` 2 - 1 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Please scrutinize the illustration of functions in the following example: main() { /* The "main" function ... execution begins here! */ say_hello(); do_work(); say_goodbye(); exit(0); /* a normal exit, no error code */ } /* sorry, we can't "goto" any of the functions below. */ say_hello() { puts("Hiya!!!"); } say_goodbye() { puts("Bye y'all!!!"); } do_work() { while (not_quitting_time) { attach(nut,bolt); pass_on(widget); } } `Language Definition - Program Environment` 2 - 2 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Statements - Simple and Compound To create a C function, you have to state the action to be taken using C language STATEMENTS in the desired combination. Certain special statements are built into the language to provide the necessary programming constructs (sequence, iteration, selection). You may be surprised, at first, by the limited number of statements built into the C language. The authors of the language wished to maintain the generality of the programming statements, forcing any special features to be outside of the programming language it- self. Other languages often have extensions in the form of statements to provide specialized features, leading to incompatible versions of the same language. BASIC is a well-known example of a language extended in far too many different ways. The C language avoids this situation by only providing those statements necessary for structuring the program's logical flow and by placing all special features into function LIBRARIES. Function libraries are nothing more than collections of commonly used functions. Simple C statements always end with a semicolon ";", the STATEMENT TERMINATOR. The C compiler depends on the semicolon to tell when a simple statement ends. Any number of simple statements may be entered, one after the other, to form a SEQUENCE of statements that are executed one at a time, first to last. The brace characters, "{" and "}", are used to enclose a sequence of statements to form a COMPOUND statement. A compound statement can be used anywhere a simple statement can be used. Thus, the body of a function (that portion enclosed in braces) is just a special form of compound statement. For example: nl = 0; is a simple statement. However, the statement: { h = h / 2; x0 = x0 + h / 2; y0 = y0 + h / 2; x = x0 + i * 32; y = y0 + 10; u = x; v = y; ++i; p( 1, i ); } is a compound statement. `Language Definition - Statements` 2 - 3 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Data Representation - Constants Numbers and characters must be entered in your C program in certain ways in order for the compiler to understand them properly. A fixed value to be used in a C expression is called a CONSTANT. Where an integer number is required, you enter it just as you write it. A leading zero indicates that the constant is in a base other than decimal. A leading zero followed by a string of digits indicates an OCTAL CONSTANT. A leading zero followed by 'X' or 'x' indicates that a hexadecimal constant follows. Thus, the decimal number, 255, can be represented as 0377 or 0xFF, as desired. A long integer constant should be terminated with the uppercase letter "L" or the lowercase letter "l" as in 1234567L or 0x2a009105L; how- ever, an integer value greater than 65535 will be considered a long. Floating point constants have the syntax of an optional sign, followed by a string of decimal digits possibly containing a decimal point, an optional exponent field containing an 'e' or 'E' followed by an optionally signed decimal integer. If the variable to be assigned the constant is not big enough to contain the constant, only the least significant bits (LSB) of the number are stored. This is, in effect, storing the remainder of dividing the constant by 256 or 65,536 or 4,294,967,296 depending on the variable size. No warning is given when this happens (except in the case of floating point overflow errors which can be trapped by the program), so the programmer must be sure that the variable can hold the number. CHARACTER CONSTANTS supply a way to specify the code for a character which does not depend on any particular character set. A character constant is a list of characters within single quotes (apostrophes). For instance, the character constant 'A' is stored in the computer as the number 65 (in deci- mal). Again, it is up to the programmer to assure that the number of charac- ters between apostrophes can fit into the variable being assigned. If more characters are specified than can fit, only the last one or two (as needed) are used. When a sequence of characters is needed, a STRING can be specified by enclosing the characters between quotes (sometimes called "double" quotes - i.e. "This is a string"). C does not place all of these characters into a variable but rather uses the ADDRESS of the first character of the string. Thus, when the string, "testing, 1 2 3", is used in an C program, the char- acters between quotes are stored in memory, and the address of the first 't' is used in the expression where the string was specified. You can say that the number generated by C to represent the string really POINTS to the string. The subject of POINTER variables, which are handy for manipulating strings, will be discussed later. There are certain control characters that are needed frequently in pro- grams, but which differ from machine to machine. These can be represented in C programs using ESCAPE SEQUENCES, to provide a machine-independent constant. The backslash character, "\", is called the ESCAPE CHARACTER and denotes the beginning of an escape sequence. A letter following the escape character in- dicates which control code is being specified. Also, certain characters that would otherwise be difficult to represent in strings and character constants are generated by following the backslash with the character. These escape `Language Definition - Data Representation` 2 - 4 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` sequences are shown in the following table: _____________________________________________________________ | | | Escape | | Sequence Control Code ASCII code used by C | | -------- ----------------- -------------------- | | | | \n,\N NEWLINE character x'0D' CR | | \t,\T horizontal tab x'09' HT | | \b,\B backspace x'08' BS | | \r,\R carriage return x'0D' CR | | \f,\F form feed x'0C' FF | | \\ backslash x'5C' backslash | | \' single quote x'27' apostrophe | | \0 null x'00' null byte | | \" double quote x'22' double quote | |_____________________________________________________________| In addition, any binary code can be represented in a string or character constant by following the backslash with a numeric constant. This is done by following the backslash with up to three octal digits. An extension which is not normally allowed in the C language is offered in MC as a convenience to microcomputer users who are only familiar with hexadecimal. The backslash may be followed by an 'x' and one or two hexadecimal digits. Either of these two methods result in an 8-bit character constant. For example, the character 'A' can be represented as '\x41' using a hexadecimal escape sequence, or as '\101' in an octal constant. Similarly, to place a carriage return at the end of a line, the following three methods could be used; however, the first is preferred: "An example of a normal escape: \n" "An example of a hexadecimal escape: \x0D" "An example of an octal escape: \015" When a character escape sequence is used within a string, the actual value of the escape sequence is stored in a string (i.e., only one byte of data per escape). Thus, the string: "\n\x0d\015" is only three bytes long in memory once the program is compiled and assembled. `Language Definition - Data Representation` 2 - 5 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Variable Names The names given to identify variables, functions, macros, and labels are called "identifiers" and all follow the same rules as to their format. C identifiers may be of any length (be practical) and must start with an alphabetic character ['A' through 'Z', 'a' through 'z'] or the underline ['_'], with the rest of the characters in the name consisting of upper-case or lower-case alphabetic characters ['A' through 'Z', 'a' through 'z'], nu- meric characters [0 through 9], or the underline character ['_']. MC keeps all the characters of your identifier as significant; however, if the identifier is to be used as an extern, only the first seven (7) char- acters of an "extern" identifier will be used by the assembler and linker, so these first seven must be unique. Also, the assembler you use may limit the length of symbol names. C is case-sensitive, i.e., recognizes the difference between lower-case and upper-case in identifiers. Thus, "EOF", "eof", and "Eof" are all differ- ent identifiers to C. However, identifiers which must be written out in assembler source code are converted to upper-case, since assemblers, in gen- eral, do not allow lower case assembly language code. A good, simple rule to follow is to use UPPER-case for macro constants only, and lower-case for all other identifiers. Since macro identifiers are not written to the assembly output file, they will not conflict with any other identifiers which are the same, except for case differences. The C language reserves certain "words" which it uses as keywords. These keywords can not be used as identifiers. The list of reserved words is: auto, break, case, char, continue, default, do, double, else, entry, enum, extern, float, for, goto, if, int, long, register, return, short, sizeof, static, struct, switch, typedef, union, unsigned, void, while. `Language Definition - Variables` 2 - 6 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Data Declarations C variables must always be declared before use. The standard procedure is to declare variables at the beginning of the program (globals) and at the beginning of each function (locals). Locals must be declared before any executable statements. MC supports the following variable types and adjec- tives: _____________________________________________________________ | | | `char` - an 8-bit unsigned character | | | | `int` - a 16-bit signed integer | | | | `float` - a 32-bit floating point | | | | `double` - a 64-bit floating point | | | | `struct` - the specifier of a structure | | | | `union` - the specifier of a union | | | | `typedef` - the operand of a typedef | | | | `short` - usually applied to ints but ignored by MC | | | | `long` - used to specify 32-bit integers | | | | `unsigned` - applied to ints or long ints | |_____________________________________________________________| Type char Character variables are stored in eight bits, or a byte. MC always treats a char as unsigned. The declaration: char c, string[81]; establishes a character variable named "c" and a singly dimensioned character array named "string" which can hold a string of maximum length equal to 80 characters. Arrays of one or more dimensions are allowed. Type int [and short int] Integer variables as well as short integer variables are stored in six- teen bits. The short declaration is provided in the interest of portability. Make no assumptions about the storage size of pointers. Although a pointer may be stored in either 16 bits or 32 bits, a pointer is not an int! The declarations: int a; short b; short int b2; are all acceptable declarations, and all result in the same size integer `Language Definition - Data Declarations` 2 - 7 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` field. This is acceptable, since the C language does not guarantee that a "short" will be shorter than integers. Integers declared in this manner are signed, i.e., their most significant bit is regarded as a sign bit. Their values can range from -32,768 to 32,767 (decimal). Unsigned fields do not have a sign bit. They range from 0 to 65,535 (decimal) and are declared like this: unsigned u; unsigned int u2; Type long int Long integer variables are stored in thirty two (32) bits. The declara- tions: long int number; long datasize; are both acceptable declarations, and each results in a 32-bit integer field. Long integers declared in this manner are signed, i.e., their most signifi- cant bit is regarded as a sign bit. The values can range from -2,147,483,648 to 2,147,483,647 (decimal). Unsigned fields do not have a sign bit. They range from 0 to 4,294,967,296 (decimal) and are declared like this: unsigned long lu; unsigned long int lu2; Type float and type double Float and double floating point variables are stored in thirty two (32) and sixty-four (64) bits respectively. The declarations: float fvalue; double dvar; are acceptable declarations; the first results in a 32-bit floating point field and the second results in a 64-bit double precision floating point field. Floating point variables are always signed. Their value varies from approximately -1.7e+38 through +1.7e+38. Floats maintain about 6-7 digits of precision while doubles maintain about 15-16 digits of precision. The only direct operations which may be performed on float and double variables are addition, subtraction, multiplication, division, comparison, logical not, negation, increment, decrement, and address_of. All others are illegal. It is important to note that per the specifications in K&R pertaining to arguments of a function, "C converts all float actual parameters to double, so formal parameters declared float have their declaration adjusted to read double." Type struct structure_tag A structure is a method of collecting one or more variables into one grouping using a single name. Where more than one variable is grouped together, they may be of the same or of different types. The grouping is commonly conceptualized as a "record". A structure template is declared with `Language Definition - Data Declarations` 2 - 8 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` the syntax: struct structure_tag { type_1 member_1; type_2 member_2; type_n member_n; }; The structure_tag is optional and is generally used when many structures will be using that structure template. The closing brace of the structure template may be followed by a variable or list of variables, just like you can do for any of the types noted above (char, int, long, float, double). When no variable follows the closing brace, the structure template remins just a template and no space is reserved for the members. When a variable list does follow the closing brace, adequate storage for all of the structure's members are reserved for each element in that variable list. For example, the following declaration is a structure template which has two members, one of which is an array: struct keyword { char *name; int index[2]; }; The member "name" is declared as a pointer_to_char in this structure decla- ration while the member "index" is an integer array of length two. Since this is only a template, no storage is reserved. The above structure template may be assigned to a structure by another structure declaration such as: struct keyword primary; which declares "primary" to be a type "struct keyword". Where a structure definition is only needed in a single module, it may be declared directly without the structure tag. For example: struct { int hours, minutes, seconds; } clock; declares a structure named "clock" which contains three ints. Each element of the structure is termed a member. A member can only be accessed as part of the structure. For instance, in the "primary" structure defined above, the syntax "primary.name" refers to the member, "name". Like- wise, the second element of the index array member is referenced by the syntax, "primary.index[1]". A variable may be typed as an array of structures (not to be confused with an array member of a structure). Using the struct keyword template illustrated above, we can declare an array of type struct keyword with: struct keyword speech[10]; `Language Definition - Data Declarations` 2 - 9 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` This declares "speech" to be an array of type struct keyword. The syntax for referencing an element of the structure is to bind the structure subscript to the name of the structure. For example, the "name" and "index[0] members of the fifth structure of "speech" would be referenced with: speech[4].name speech[4].index[0] A variable may be declared as a pointer_to_struct. Again following our example, if "ps" is declared via "struct keyword *ps;", then the above mem- bers would be referenced via the syntax: (*ps).name (*ps).index[1] Since the structure dot operator has a higher precedence than the indirection operator, the parentheses are needed. This somewhat kludgy syntax can be re- placed with the structure indirection shorthand using the "->" operator. The structure indirection operator is composed of a minus sign followed by a right angle bracket. The above two references would now be written as: ps->name ps->index[1] A pointer_to_structure can be passed as an argument to a function; a structure cannot! Likewise, a function can return a pointer_to_structure but canot return a structure. Another limitation of structures is that a member of a structure cannot be a structure of the same type (think about that); however, it can be a pointer to its type. This does not restrict members from being other structure types. Type union union_tag A union provides a technique for accessing elements of a record in dif- ferent ways at different times. A union is declared similarly to a structure. Also, you can only access a member or take the address of a union. The members of a union all have a zero offset from the union's origin. The actual amount of memory space assigned is the space required by its largest member. The union declaration is somewhat similar to the EQUIVALENCE statement of FORTRAN. An interesting example of a union is found in the Z80REGS header file. In this union named "REGS", two members are declared - each a structure. One member is the structure named "wordregs" and the other is a structure named "byteregs". The storage space for "byteregs" completely overlaps the storage for "wordregs" thereby providing you with a convenient method of accessing the low-order or high-order register of a 16-bit register pair. This example has a union with two structure members. Alternatively a union could be a member of a structure. Type void The "void" type is used to declare a function which has no return value. It is beneficial to type "void" such functions because an attempt to use the value returned by a void function will be flagged as an error. `Language Definition - Data Declarations` 2 - 10 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Arrays Arrays of one or more dimensions are allowed for short, int, long, unsigned, float, double, or pointer types, as well as for structures and unions. An array is denoted by appending the the dimension enclosed in square brackets to the identifier. For example, arrays of one, two, and three dimensions could be declared like this: char buffer[81]; double grid[25][25]; char bit_plane[8][24][80]; The first defines a character buffer of 81 elements. The second defines a two dimension array of doubles having 625 elements. The third defines bit_plane to be a three dimensional character array - 8 planes of 24 rows by 80 columns. Pointers Pointers may be declared for any data type. Pointer variables are dif- ferent from the types described so far, in that they normally contain the ADDRESS of a data item. For example, char *cp; declares a pointer_to_char variable named "cp". The asterisk denotes INDIRECTION, i.e., that the data item is referred to indirectly through the pointer variable "cp". The address of the data item must be stored in the variable, "cp", before it is used as a pointer to access a data item. To refer to the data itself, an asterisk is placed before the name, e.g., *cp denotes the data item. An example of practical use follows: getit(cp) char *cp; { int c; while ((c=getchar()) != EOL && c != EOF) *cp++ = c; *cp=NULL; return c; } The function, getit(), inputs characters continually from the standard input until end-of-file or end-of-line characters are encountered. When getit() is called, the pointer argument, cp, contains the address of a buffer area. One by one the characters are placed in the buffer, (*cp++ = c), and the buffer pointer is incremented by the post increment operator (++). A simple peek() and poke() set of functions can easily be written using pointers. Witness the following two functions: int peek(s) char *s; { return *s; } void poke(s,c) char *s; int c; { *s = c; } `Language Definition - Data Declarations` 2 - 11 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` When the actual argument to a function is an array, the formal argument declaration in the function may be made in two ways. It may be declared as a pointer or an array without the size in the array declaration. For example, funk(arg1,arg2) int arg1; char arg2[]; declares a character array, "arg2", of an unknown number of elements. This could also have been declared as: funk(arg1,arg2) int arg1; char *arg2; to define arg2 as a pointer to a char. Pointers may be INDEXED to get to the "nth" item in an array. Using the example above, arg2 would contain the address of the beginning of an array of characters. "arg2[0]" denotes the first element in the array, and "arg2[22]" denotes the 23rd element. No matter how a pointer is declared, either method of using the pointer may be employed as the programmer sees fit. Thus, "*arg2" and "arg2[0]" refer to the same data item and may be used interchangeably in the same program. Using "*arg2" is a little more efficient, however. The array declaration without the number of elements within the brackets is allowed only in external declarations and in argument declarations. Sta- tics, globals, and locals declared this way are illegal. Pointers may point to other pointers. This bombshell of a statement is probably too much for you after the last few paragraphs; it must be said, however. C allows pointers to have more than one LEVEL OF INDIRECTION. This can be declared several ways: shine() { char *names[]; char *(*words); . . . . . . . . . . . . Both of these declarations result in the same effect: a pointer which points to a pointer which points to a character field. Pointer variables may have up to 255 levels of indirection. However, the practical limit is the ability of the programmer to keep track of all this. In general, two levels of indirection are all most folks can take. The operations which may be performed on pointers are severely restric- ted. The allowed operations are: (a) Pointer + offset or offset + pointer, offset is scaled. (b) Pointer - offset, offset is scaled. (c) Pointer - pointer, result is scaled. (d) Pointer = expression. (e) ++, --, logical not, ||, &&, and comparisons. `Language Definition - Data Declarations` 2 - 12 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Initialization of Variables C allows you to initialize variables within the declaration statement. For instance, the declarative, int var = 100; declares "var" to be of type "int" with an initial value of 100. A pointer to a string may be declared and initialized with the format, char *pstring = "This is an initialized string pointer"; You may declare a series of variables of the same type as well as their initializations with the format, float f1=1.0, f2=10.0, f3=100.0, f4=1000.0, f5=10000.0; An array can be initialized by placing the initialization values within braces. For example, int table[5] = {1, 10, 100, 1000, 10000}; declares table to be an integer array of dimension 5. Table[0] is initialized to the value 1, table[1] to 10, table[2] to 100, table[3] to 1000, and table[4] to 10000. Structures may also be initialized. For instance, the format, struct { char *name; } dow[5] = {"Mon","Tue","Wed","Thu","Fri"}; defines the array of structures, "dow", consisting of one element, "*name". The array is of dimension 5 with each array element constituting the struc- ture. Each structure element is initialized as a pointer to the corresponding 3-character day name. Thus, "puts(dow[0].name);" would print, "Mon". Note that automatic structures and automatic arrays (these are considered "aggregates" by K&R) may not be initialized. You also cannot initialize a union. `Language Definition - Data Declarations` 2 - 13 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Scope of Variables and Functions Variables or functions which are declared outside of any function, i.e, which are not parameters to functions or declared within braces, are called "external". They are external to all functions. External variables and functions can be accessed only from any of the functions subsequently defined within the module being compiled. They are usually used to create static variables which are to be accessed only by a function or group of functions. However, by using the "extern" statement in a separately compiled module, say module_B, an external variable or function of module_A may be accessed from that separately compiled module_B. Please do not confuse "extern" and external. External variables and functions are declared without the "extern" statement strictly by their position exterior to any function or compound statement. Variables declared within a function are called "local". Functions may not be defined within another function, as is the case with the Pascal lang- uage. However, a function may be DECLARED "extern" so that it may be accessed within the currently defined function. Local variables may not be accessed from any other functions. They only exist for the function in which they are declared. Even within the function, a local variable can only be accessed in the block in which it is declared. Remember, a block is a section of code contained within a matching pair of braces. Local variables can have the same name as external variables, or local variables declared in different blocks. If a local variable has the same name as an external variable then the local variable is the one accessed when used within the local block. In the following example: int same; /* this is an external variable */ funk(same) { return same; /* return local copy */ } hunk() { if (block_1) { int same; /* some code could go here */ } else { char same; /* some other code here */ } } every declaration of "same" was a unique variable. Although legal, the dec- laration of local variables with the same name within the same function is not recommended. This type of trickery, as shown in hunk(), needlessly causes confusion and is easily avoided. `Language Definition - Scope of Variables` 2 - 14 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Storage Classes Variables and functions may be declared as being in certain classes. These classes specify where variables are to be stored. The classes available in C are: auto, static, extern, register, and typedef. The storage class of an object is specified by placing the class name in front of the normal dec- laration: auto char c; static int ai[20][80]; Storage Class - auto Variables which are declared "auto" are stored on the stack. This is the default for variables declared within a function, so the "auto" keyword may therefore be omitted. Local variables which are "auto" are created afresh each time the function in which they are declared is called. This allows functions to be re-entrant and recursive. Functions may not be declared with class "auto" since a function must be declared outside of any other function. As K&R say, the C compiler is incapable of compiling code onto the stack! The scope of an auto variable is the block (within braces) in which it is declared. All other portions of the code being compiled are oblivious to the existence of the auto variable, and in fact there may exist other vari- ables with the same name. The auto class is illegal for functions and other external definitions (any variables declared outside of a function). In terms of speed of access, auto variables are accessed the slowest; thus, if timing is important and your program does not require recursion, use register or static variables. Storage Class - Register Variables declared in the register class are treated similarly to auto variables by MC. The number of register variables permitted depends on the number of extra machine registers available for use. MC makes use of the two index registers of the Z-80 (IX and IY) which can be used for ints and pointers. Register variables are usually accessed faster than autos but not as fast as statics. Any register variables declared in excess of two are stored on the stack in the same manner as an auto and are also illegal out- side of a function; however, each function is permitted up to two register variables. The scope of register variables is the same as that for auto variables. The formal arguments of a function may also be declared register. Note that the "address_of" operator may not be applied to register variables. Storage Class - Extern The "extern" storage class allows an external variable declared in one module to be accessed from another module. A "module" is what is processed by one invocation of MC, i.e., one set of C source input. Let's say that the following declaration: int choice; `Language Definition - Storage Classes` 2 - 15 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` exists in module 1. If module 2 functions need to access this same variable, the declaration: extern int choice; would allow the access needed. C will not reserve any storage for "choice" in module 2, since the storage class, "extern", tells C that storage has been reserved in another module. The programmer MUST ensure that the declarations are compatible between modules. In other words, all "extern" declarations must match the external declaration (declaration without "extern") by having the same type, size, and amount of indirection. Otherwise, C may access the variable in incorrect ways. The extern statement may also be used to declare what a function returns before it is defined in the program. This "forward" declaration allows a function which returns something other than a signed integer to be defined before it is used. If the forward declaration is not given and a function is as-yet-undefined, the compiler assumes that the function returns a signed integer which may be incorrect for many functions. Storage Class - Static Static objects are stored in declared, fixed memory space. Their beha- vior is the same as that of external variables; their scope is more limited, however. Static variables declared outside of a function can only be accessed by functions within the module being compiled. Other (separately compiled) modules cannot get to them by declaring them "extern". Static variables declared outside of all functions are accessible to all functions subse- quently defined within the module. Static variables declared within a func- tion are similar in scope to auto and register variables. They can only be accessed in the block in which they are declared. Thus, two static variables with the same name may be declared in different functions. Functions may also be defined as "static", making them only accessible from within the current module. However, since MC is a one-pass compiler, the definition of a static function must precede any reference to the static function. This is because the compiler assumes that an as-yet-undefined function is an external function. Alternatively, you may use a forward dec- laration. Storage Class - typedef Typedef is provided in the C language not as a unique storage class, but as a means for creating new data type names. Note that typedef does not create new data types, but rather provides a method for giving special names to existing data types. This facility is useful to create customized names which bear some association to the class of objects being typed. For instance, the typedef statement: typedef char *POINTER; declares "POINTER" to be a synonym for the type pointer_to_character. Thus, a statement such as the following can be used to clearly denote the meaning of `Language Definition - Storage Classes` 2 - 16 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` the declarations: POINTER first, last; POINTER table[SIZE]; A typedef is somewhat similar to the preprocessor #define. It performs some textual substitution. It is a little more flexible, though, since the actual substitution can be more complex. It has a great use in enhancing the portability across compilers by introducing a single point definition of a variable type rather than having the actual definition scattered throughout many source modules. Thus, where a declaration type must be changed, it only need be changed in one place (usually in a header file). Storage Class - Defaults When a variable is declared by only stating the storage class: auto x1; register x2; extern x3; static x3; the variable type is assumed to be "int". This is a perfectly acceptable shorthand way to make integer declarations. When the declaration of a local (declared within a function) variable or argument has no storage class, C assumes that the variable is an auto vari- able. A function declared within another function body is assumed to have a storage class of external. The compiler regards the declaration as if an "extern" statement preceded it. External declarations which do not have a storage class declared are special entities. They belong to the implicit class, "external", and may be referenced from other (separately compiled) modules which declare the vari- able as "extern". `Language Definition - Storage Classes` 2 - 17 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Expressions One of the most powerful features of the C language is its expression capabilities. The amount of work that can be done by one expression is some- times mind-boggling. A quick example: (end_of_file = (c=getc(file))==EOF)) ? fclose(file) : ++count ; This convoluted statement will get a character from a file and place it in the variable, "c". The character is compared to the value "EOF" which indicates end of file; the result, true or false, is placed in the variable, "end_of_file". Finally, if it was the end of the file, the file is closed. Otherwise, a counter variable, "count", is incremented to provide a count of the characters read. The example was a bit exaggerated, and expressions this complex can be quite hard to understand. Two statements must be made about the complexity of expressions in the C language. The programmer who does not fully know and use C's expression capabilities is seriously handicapped, unable to use the full power of the C language. On the other hand, a quotation from THE ELEMENTS OF PROGRAMMING STYLE by Kernighan and Plaugher is appropriate: "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" The word "maintain" could be substituted for "debug" in the quote above, and it would still be valid. You must be able to understand later what you wrote into your program. If others are going to have to maintain your pro- gram, the principle of KISS (Keep It Simple, Stupid) should prevail. This is not intended to discourage the use of complex expressions. Just keep in mind that the more operators involved in an expression, the more difficult it is to properly place parentheses and keep the precedence of operators straight. There are two kinds of expressions in many computer languages: logical expressions and arithmetic expressions. Logical expressions are usually for comparing things and for making choices. The result of a logical expression is either true or false. Arithmetic expressions result in a number. Usually an assignment to a variable is made to save the result of the arithmetic ex- pression, or it is passed as an argument. In many language implementations, only one type of expression may be used in certain contexts. For instance, the BASIC program statement: 1000 A = ( C <= B ) attempts to assign to A the result of the comparison C to B. This is not allowed in many implementations because they are expecting an arithmetic assignment. Even if some BASIC's allow it, it is best not to do this type of assignment, in order to keep programs relatively portable. `Language Definition - Expressions` 2 - 18 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Another situation is shown in PASCAL: IF A := (B < C) THEN BEGIN where the PASCAL compiler expects a boolean expression between IF and THEN. Even if A is a boolean variable this assignment is not allowed in most PASCAL compilers. This is not intended to denegrate PASCAL. There are good reasons why the authors of PASCAL did things this way. However, the C language does not draw distinctions between types of expressions within the context of the program. The distinctions are made in the types of operators instead. Primary Expressions The elements which are manipulated by operators in an expression are called primary expressions. The basic elements which make up a primary ex- pression are identifiers, constants, and strings. Identifiers are the names of variables and functions. Function and array identifiers effectively re- solve to the address of the function or array, while all other variable identifiers resolve to the contents of the variable. Constants are character or numeric (decimal, hex, octal) values. Strings resolve to a character pointer which points to the first character of the string. The operators which C provides for stating primary expressions group left to right. This means that the left-most operator is interpreted first. The five primary operators supplied by C are: isolating parentheses, sub- scripting, function invocation, structure/union ARROW, and structure/union DOT. _____________________________________________________________ | | | (expression) isolating parentheses | | | | p_ex [expression] subscripting | | | | p_ex (expression_list) function invocation | | | | . DOT obtains member of structure/union | | | | -> ARROW obtains member indirectly through | | structure/union pointer | | | | Note: "p_ex" stands for "primary_expression" | |_____________________________________________________________| Isolating Parentheses When the order in which an expression is to be evaluated conflicts with the precedence of operators, the isolating parentheses provide a way around the conflict. The expression within parentheses is evaluated first, before the result of the enclosed expression is used in any expression outside the parentheses. For example, when predicting the percentage of up-time for any equipment, the following formula is used: `Language Definition - Expressions` 2 - 19 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` MTBF availability = ------------- MTBF + MTTR MTBF = mean time between failures MTTR = mean time to repair When writing this formula into a C expression a conflict occurs because the division operator takes precedence over the addition operator. If the ex- pression is written like this: up_time = mtbf / mtbf + mttr the result will always be mttr plus one. This is because the division is done before the addition. To avoid this, the expression can be stated as follows: up_time = mtbf / (mtbf + mttr) to achieve the correct result. Parentheses can be used on either side of an assignment operator. At the risk of confusing the reader with as-yet undefined operators, we nevertheless provide an example using pointers. In certain cases during the use of pointer arrays, indirection must be performed before subscripting into the data item. Since subscripting takes precedence over indirection, this kind of expression must be written as follows: example(arg) char *arg[]; /* pointer to a char pointer array */ { /* wrong way - accesses third pointer */ /* instead of third character. */ *arg[3] = 0 ; /* right way - zero's the third character of */ /* first string */ (*arg)[3] = 0 ; } Subscripting Subscripting is denoted by a subscript in brackets following a primary expression: primary_expression [subscript] primary_expression [subscript_1][subscript_2] If the primary expression is an array name, or a pointer to an array, the subscripted expression returns the element denoted by the value of the sub- script. C arrays are subscripted from zero, i.e, the first element in an array is numbered zero. If more than one dimension is specified, the storage of array elements is such that the rightmost subscript varies fastest as elements are accessed in storage order. `Language Definition - Expressions` 2 - 20 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Function identifiers may not be subscripted. A primary expression deno- ting an array of pointers to functions may be subscripted. The primary expression must indicate the size of the object being subscripted (char, int, pointer) or the subscript will produce an error message. For example: x = 25[3]; is invalid. Function Invocation A primary expression followed by parentheses will cause the function denoted by the primary expression to be called. Arguments may be passed to the invoked function by placing them in the parentheses, separated by comma's. Any number of arguments can be passed to the called function. Care must be taken that the number of arguments passed is the number that the function expects. Otherwise unpredictable behavior may result (certainly not correct behavior). If a variable number of parameters must be passed, then a control indicator must be passed to tell the called function how many arguments there are (for example, the fprintf() and printf() functions in the standard library). Arguments can be any valid C expression, including other function calls. The arguments are evaluated from right to left, i.e., the right-most expression is evaluated first. The programmer should not rely on this order of evaluation since some other implementations of the C language evaluate them left to right. Statements like this one: funk( arg++, arg2[arg] ); will cause different elements of arg2 to be passed to funk() when different C compilers are used. Stay away from this sort of trickery if you can. . DOT [designate member of structure/union] The structure/union dot operator is used to specify a member of a structure or union. Thus, if there exists for example, a structure of the name "time" which has members named "hour", "minutes", and "seconds", then the constructs "time.hour", "time.minutes", and "time.seconds" refer to the member objects when used in an expression. Other details concerning struc- tures and unions are provided in the section on data declarations. -> ARROW [designate member through structure/union pointer] This structure/union operator is used when you have a pointer to a structure or union and you wish to reference one of its members. For example, if the identifier "pt" has been declared a pointer to the time structure noted above, then the constructs, "pt->hour", "pt->minutes", and "pt->sec- onds" reference the objects within the structure. The construct of "structure or union pointer -> membername" is eqivalent to "(* structure or union pointer).membername" and is essentially a shorthand method of using the indirection operator. `Language Definition - Expressions` 2 - 21 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Unary Operators Unary operators operate on one object (hence the name). If more than one unary operator operates on the same object, the operators are evaluated right to left. The unary operators supplied by C are: _____________________________________________________________ | | | OPERATOR OBJECT DESCRIPTION | | -------- ---------- --------------------------------- | | | | * expression indirection; means "object at..." | | | | & lvalue pointer; means "address of..." | | | | - expression negates the expression; | | "minus expression" | | | | ! expression logical complement; | | "not expression" | | | | ~ expression one's complement of expression | | | | ++ lvalue increment and save in lvalue | | | | -- lvalue decrement and save in lvalue | | | | (typename) expression cast the result to "typename" | | | | sizeof expression obtain the size of the result of | | expression in "bytes" | | | | sizeof (typename) obtain the size of "typename" | | in bytes | |_____________________________________________________________| All unary operators must appear before (prefix) the object, except the increment and decrement operators which may appear after (postfix) the ob- ject. The term "lvalue" means an expression which evaluates to the address of a data element or pointer field. Constants, function identifiers, and array names are not lvalues. The term derives from the observation that "lvalues" are the only expressions allowed on the left side of an assignment expression. '*' [asterisk - object at] The indirection operator can only operate on a pointer expression. Its meaning is effectively "object at ..." The address contained in the pointer is the address of the object referred to by this type of expression. For example, `Language Definition - Unary Operators` 2 - 22 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` see_pointer (pointer) char *pointer; /* a character pointer */ { /* first print the address passed in pointer */ printf("address is: %u ",pointer); /* now print the data at that address */ printf("data is: %u ", *pointer); } will print both the address (contents of the pointer variable) and the data at that address (result of the indirect expression). '&' [ampersand - address of] This unary operator effectively means "address of..." or "pointer to...". It evaluates to the address of the lvalue it precedes. '-' [minus sign - negation] When the unary negation operator precedes an expression, the result is the two's complement negative of the value of the expression. When the '-' precedes an unsigned or pointer expression, the one's complement of the value is taken. '!' [exclamation point - logical NOT]] The unary logical complement operator, or "not" operator evaluates to FALSE if the expression is true and to TRUE if the expression is false. FALSE is defined as 0 and any non-zero value is considered to be TRUE. However, all C operators which result in TRUE or FALSE produce one (1) as the value for TRUE. Thus, the least significant bit of the result indicates TRUE or FALSE. '~' [tilde - bitwise complement] The one's complement operator inverts every bit in the expression. No regard is given to the type of the expression. '++', '--' [increment, decrement] The increment and decrement operators may be used either before (prefix) the operand or after (postfix) the operand. The operand must be an lvalue or lvalue expression. In either case the contents of the lvalue is incremented or decremented and stored back into the lvalue. The difference between prefix and postfix is how the result of the expression is produced. Prefix means that the value after the increment or decrement is the result of the expres- sion. Postfix means that the value returned by the expression is the value before the increment or decrement. (typename) [type casts] The "(typename) expression" is used to explicitly force the result of "expression" to be converted to the type specified by the cast. Casts are used when the variable or expression result type does not agree with what may be needed for continued evaluation. For instance, to take the log of an integer number, it is necessary to convert it to a double. Thus, the state- `Language Definition - Unary Operators` 2 - 23 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` ment: dvalue = log( (double) number); converts the value of the integer "number" to a double before placing it on the stack as the argument to the log function. Note that "number" is not it- self changed, but rather the value is typed to conform to the cast. A cast is similar to the verbs: CDBL, CSNG, CINT commonly found in the BASIC language. sizeof [obtain size of operand in bytes] It is sometimes useful to be able to use the size of an object in an evaluation without actually knowing the physical size of the storage provided for the object. For instance, since the actual size in storage units of an int may vary from machine to machine, the programmer may need to compute a figure irrespective of the storage element size. The "sizeof" operator pro- vides this facility. Since all sizes are known to the compiler at compile time, "sizeof object" may be used in the same way as a constant. For in- stance, the statements: int array[100]; bytes = sizeof array; assigns to "bytes", the actual quantity of storage elements occupied by "array". "Sizeof" may also be used in the format, "sizeof(typename)". Thus, sizeof(int) and sizeof(long) are valid operations. There is significant use of the sizeof operator in calculating the size of a structure. Given the structure, struct functions {  char *name;  double (*func)(); }builtins[] = {  "sin", sin,  "cos", cos,  "tan", tan,  "asin", asin,  "acos", acos,  "atan", atan,  "exp", exp,  "log", log,  0, 0 }; the number of structure entries is simply sizeof builtins / sizeof (struct functions) `Language Definition - Unary Operators` 2 - 24 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` Binary Operators Binary operators act upon two expressions together. The type of the result depends on the type of the two expressions. If both types are not identical, the lower type will be promoted to the higher type. The order from lower to higher is: char, int, unsigned int, long, unsigned long, float, double. When the operator is addition or subtraction, if only one expression is a pointer, the result of the expression is a pointer of the same type; if both expressions are pointers, the result is unsigned. These "usual arith- metic conversions" are documented in K&R page 184 (section 6.6) and are reproduced here for convenience: "First, any operands of type char or short are converted to int, and any of type float are converted to double. Then, if either operand is double, the other is converted to double and that is the type of the result. Otherwise, if either operator is long, the other operand is converted to long and that is the type of the result. Otherwise, if either operator is unsigned, the other operand is converted to unsigned and that is the type of the result. Else, both operands must be int, and that is the type of the result." When several binary expressions are concatenated together (without isolating parentheses) the order in which the binary expressions are eval- uated depends on the precedence of the operators in the expression. In the expression, a + b * c the evaluation of "b * c" precedes the evaluation of the addition, since multiplication has a higher precedence than addition. The expression is evaluated like this: a + (b * c) As previously described, isolating parentheses can be used to change the order of evaluation. To have the addition performed first, the expression can be written: (a + b) * c Each class of operands is described below in order from the highest precedence to the lowest. When all the operators in a complex expression have the same level of precedence they are evaluated in a certain order; right to left or left to right. It can be said that a class of operators "group" left to right, or right to left. If the order of evaluation between like operators does not matter, the operator is said to be associative. Here is an example of how the order of evaluation affects an expression: a / b / c / d The division operator is said to group "left to right"; thus, this expression is evaluated a follows: (((a / b) / c) / d) `Language Definition - Binary Operators` 2 - 25 `The MISOSYS C Language Compiler` `Copyright (c) 1985 MISOSYS, Inc., All rights reserved` ___________________________________________________________ | | | PRECEDENCE OF BINARY OPERATORS | | (Highest to lowest) | | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | | MULTIPLICATIVE OPERATORS - group left to right | | expression * expression multiplication | | expression / expression division | | expression % expression modulus (remainder) | | - - - - - - - - - - - - - - - - -