发信人: jerk (徐子陵), 信区: Unix
标 题: Unix Unleased -17
发信站: 饮水思源站 (Fri Nov 20 20:18:16 1998) , 站内信件
17 — The C Programming Language
By James Armstrong
The History of C
Creating, Compiling, and Executing Your First Program
An Overview of the C Language
Elementary C Syntax
Expressions
Comparison Expressions
Mathematical Expressions
Bitwise Operations
Statement Controls
Creating a Simple Program
Writing the Code
Compiling the Program
Executing the Program
Building Large Applications
Making Libraries with ar
Building Large Applications with make
Debugging Tools
Summary
17 — The C Programming Language
By James Armstrong
C is the programming language most frequently associated with UNIX. Since the
1970s, the bulk of the operating system and applications have been written in C.
This is one of the major reasons why UNIX is a portable operating system.
The History of C
C was first designed by Dennis Ritchie for use with UNIX on DEC PDP-11
computers. The language evolved from Martin Richard's BCPL, and one of its
earlier forms was the B language, which was written by Ken Thompson for the DEC
PDP-7. The first book on C was The C Programming Language by Brian Kernighan and
Dennis Ritchie, published in 1978.
In 1983, the American National Standards Institute established a committee to
standardize the definition of C. Termed ANSI C, it is the recognized standard
for the language grammar and a core set of libraries. The syntax is slightly
different from the original C language, which is frequently called K&R—for
Kernighan and Ritchie.
Creating, Compiling, and Executing Your First Program
The development of a C program is an iterative procedure. Many UNIX tools are
involved in this four-step process. They are familiar to software developers:
Using an editor, write the code into a text file.
Compile the program.
Execute the program.
Debug the program.
The first two steps are repeated until the program compiles successfully. Then
the execution and debugging begin. Many of the concepts presented may seem
strange to non-programmers. This chapter endeavors to introduce C as a
programming language.
The typical first C program is almost a cliché. It is the "Hello, World"
program, and it prints the simple line Hello, World. Listing 17.1 is the source
of the program.
Listing 17.1. Source of Hello World.
main()
{
printf("Hello, World\n");
}
This program can be compiled and executed as follows:
$ cc hello.c
$ a.out
Hello, World
$
The program is compiled with the cc command, which creates a program a.out if
the code is correct. Just typing a.out will run the program. The program
includes only one function, main. Every C program must have a main function; it
is where the program's execution begins. The only statement is a call to the
printf library function, which passes the string Hello, World\n. (Functions are
described in detail later in this chapter.) The last two characters of the
string, \n, represent the carriage return-line feed character.
An Overview of the C Language
As with all programming languages, C programs must follow rules. These rules
describe how a program should appear, and what those words and symbols mean.
This is the syntax of a programming language. Think of a program as a story.
Each sentence must have a noun and a verb. Sentences form paragraphs, and the
paragraphs tell the story. Similarly, C statements can build into functions and
programs.
For more information about programming in C, I recommend the following books
from Sams Publishing:
Teach Yourself C in 21 Days by Peter Aitken and Bradley Jones
Programming in ANSI C by Stephen G. Kochan
Elementary C Syntax
Like all languages, C deals primarily with the manipulation and presentation of
data. BCPL deals with data as data. C, however, goes one step further to use the
concept of data types. The basic data types are character, integer, and floating
point numbers. Other data types are built from these three basic types.
Integers are the basic mathematical data type. They can be classified as long
and short integers, and the size is implementation-dependent. With a few
exceptions, integers are four bytes in length, and they can range from
2,147,483,648 to 2,147,483,647. In ANSI C, these values are defined in a
header—limit.h—as INT_MIN and INT_MAX. The qualifier unsigned moves the range
one bit higher, to the equivalent of INT_MAX-INT_MIN.
Floating point numbers are used for more complicated mathematics. Integer
mathematics is limited to integer results. With integers, 3/2 equals 1. Floating
point numbers give a greater amount of precision to mathematical calculations:
3/2 equals 1.5. Floating point numbers can be represented by a decimal number,
such as 687.534, or with scientific notation: 8.87534E+2. For larger numbers,
scientific notation is preferred. For even greater precision, the type double
provides a greater range. Again, specific ranges are implementation-dependent.
Characters are usually implemented as single bytes, although some international
character sets require two bytes. One common set of character representations is
ASCII, and is found on most U.S. computers.
An array is used for a sequence of values that are often position-dependent. An
array is useful when a range of values of a given type is needed. Related to the
array is the pointer. Variables are stored in memory, and a pointer is the
physical address of that memory. In a sense, a pointer and an array are similar,
except when a program is invoked. The space needed for the data of an array is
allocated when the routine that needs the space is invoked. For a pointer, the
space must be allocated by the programmer, or the variable must be assigned by
dereferencing a variable. The ampersand is used to indicate dereferencing, and
an asterisk is used to when the value pointed at is required. Here are some
sample declarations:
int i;Declares an integer
char c;Declares a character
char *ptr;Declares a pointer to a character
double temp[16];Declares an array of double-precision floating point
numbers with 16 values
Listing 17.2 shows an example of a program with pointers.
Listing 17.2. An example of a program with pointers.
int i;
int *ptr;
i=5;
ptr = &i;
printf("%d %x %d\n", i,ptr,*ptr);
output is: 5 f7fffa6c 5
NOTE: A pointer is just a memory address and will tell you the address of any
variable.
There is no specific type for a string. An array of characters is used to
represent strings. They can be printed using an %s flag, instead of %c.
Simple output is created by the printf function. printf takes a format string
and the list of arguments to be printed. A complete set of format options is
presented in Table 17.1. Format options can be modified with sizes. Check the
documentation for the full specification.
Table 17.1. Format conversions for printf.
Conversion
Meaning
%%Percentage sign
%EDouble (scientific notation)
%GDouble (format depends on value)
%XHexadecimal (letters are capitalized)
%cSingle character
%dInteger
%eDouble (scientific notation)
%fDouble of the form mmm.ddd
%gDouble (format depends on value)
%iInteger
%ldLong integer
%nCount of characters written in current printf
%oOctal
%pPrint as a pointer
%sCharacter pointer (string)
%uUnsigned integer
%xHexadecimal
Some characters cannot be included easily in a program. New lines, for example,
require a special escape sequence, because there cannot be an unescaped newline
in a string. Table 17.2 contains a complete list of escape sequences.
Table 17.2. Escape characters for strings.
Escape Sequence
Meaning
\"Double quote
\'Single quote
\?Question mark
\\Backslash
\aAudible bell
\bBackspace
\fForm feed (new page)
\nNew line
\oooOctal number
\rCarriage return
\tHorizontal tab
\vVertical tab
\xhhHexadecimal number
A full program is compilation of statements. Statements are separated by
semicolons. They can be grouped in blocks of statements surrounded by curly
braces. The simplest statement is an assignment. A variable on the left side is
assigned the value of an expression on the right.
Expressions
At the heart of the C programming language are expressions. These are techniques
to combine simple values into new values. There are three basic types of
expressions: comparison, numerical, and bitwise expressions.
Comparison Expressions
The simplest expression is a comparison. A comparison evaluates to a TRUE or a
FALSE value. In C, TRUE is a non-zero value, and FALSE is a zero value. Table
17.3 contains a list of comparison operators.
Table 17.3. Comparison operators.
Operator
Meaning
Operator
Meaning
<Less than>=Greater than or equal to
>Greater than||Or
==Equal to&&And
<=Less than or equal to
Expressions can be built by combining simple comparisons with ANDs and ORs to
make complex expressions. Consider the definition of a leap year. In words, it
is any year divisible by 4, except a year divisible by 100 unless that year is
divisible by 400. If year is the variable, a leap year can be defined with this
expression.
((((year%4)==0)&&((year%100)!=0))||((year%400)==0))
On first inspection, this code might look complicated, but it isn't. The
parentheses group the simple expressions with the ANDs and ORs to make a complex
expression.
Mathematical Expressions
One convenient aspect of C is that expressions can be treated as mathematical
values, and mathematical statements can be used in expressions. In fact, any
statement—even a simple assignment—has values that can be used in other places
as an expression.
The mathematics of C is straightforward. Barring parenthetical groupings,
multiplication and division have higher precedence than addition and
subtraction. The operators are standard. They are listed in Table 17.4.
Table 17.4. Mathematical operators.
Operator
Meaning
Operator
Meaning
+Addition/Division
-Subtraction%Integer remainder
*Multiplication^Exponentiation
There are also unary operators, which effect a single variable. These are ++
(increment by one) and — (decrement by one). These shorthand versions are quite
useful.
There are also shorthands for situations in which you want to change the value
of a variable. For example, if you want to add an expression to a variable
called a and assign the new value to a, the shorthand a+=expr is the same as
a=a+expr. The expression can be as complex or as simple as required.
NOTE: Most UNIX functions take advantage of the truth values and return 0 for
success. This enables a programmer to write code such as
if (function())
{
error condition
}
The return value of a function determines whether the function worked.
Bitwise Operations
Because a variable is just a string of bits, many operations work on those bit
patterns. Table 17.5 lists the bit operators.
Table 17.5. Bit operators.
Operator
Meaning
Operator
Meaning
&Logical AND<<Bit shift left
|Logical OR>>Bit shift right
A logical AND compares the individual bits in place. If both are 1, the value 1
is assigned to the expression. Otherwise, 0 is assigned. For a logical OR, 1 is
assigned if either value is a 1. Bit shift operations move the bits a number of
positions to the right or left. Mathematically, this is the same as multiplying
or dividing by 2, but circumstances exist where the bit shift is preferred.
Bit operations are often used for masking values and for comparisons. A simple
way to determine whether a value is odd or even is to perform a logical AND with
the integer value 1. If it is TRUE, the number is odd.
Statement Controls
With what you've seen so far, you can create a list of statements that are
executed only once, after which the program terminates. To control the flow of
commands, three types of loops exist in C. The simplest is the while loop. The
syntax is
while (expression)
statement
So long as the expression between parentheses evaluates as non-zero—or TRUE in
C—the statement is executed. The statement actually can be a list of statements
blocked off with curly braces. If the expression evaluates to zero the first
time it is reached, the statement is never executed. To force at least one
execution of the statement, use a do loop. The syntax for a do loop is
do
statement
while (expression);
The third type of control flow is the for loop. This is more complicated. The
syntax is
for(expr1;expr2;expr3) statement
When the expression is reached for the first time, expr1 is evaluated. Next,
expr2 is evaluated. If expr2 is non-zero, the statement is executed, followed by
expr3. Then, expr2 is tested again, followed by the statement and expr3, until
expr2 evaluates to zero. Strictly speaking, this is a notational convenience,
for a while loop can be structured to perform the same actions. For example,
expr1;
while (expr2) {
statement;
expr3
}
Loops can be interrupted in three ways. A break statement terminates execution
in a loop and exits it. continue terminates the current iteration and retests
the loop before possibly re-executing the statement. For an unconventional exit,
you can use goto. goto changes the program's execution to a labelled statement.
According to many programmers, goto is poor programming practice, and you should
avoid using it.
Statements can also be executed conditionally. Again, there are three different
formats for statement execution. The simplest is an if statement. The syntax is
if (expr) statement
If the expression expr evaluates to non-zero, the statement is executed. You can
expand this with an else, the second type of conditional execution. The syntax
for else is
if (expr) statement else statement
If the expression evaluates to zero, the second statement is executed.
NOTE: The second statement in an else condition can be another if statement.
This situation might cause the grammar to be indeterminant if the structure
if (expr) if (expr) statment else statement
is not parsed cleanly.
As the code is written, the else is considered applicable to the second if. To
make it applicable with the first if, surround the second if statement with
curly braces. For example:
$ if (expr) {if (expr) statement} else statement
The third type of conditional execution is more complicated. The switch
statement first evaluates an expression. Then it looks down a series of case
statements to find a label that matches the expression's value and executes the
statements following the label. A special label default exists if no other
conditions are met. If you want only a set of statements executed for each
label, you must use the break statement to leave the switch statement.
This covers the simplest building blocks of a C program. You can add more power
by using functions and by declaring complex data types.
If your program requires different pieces of data to be grouped on a consistent
basis, you can group them into structures. Listing 17.3 shows a structure for a
California driver's license. Note that it includes integer, character, and
character array (string) types.
Listing 17.3. An example of a structure.
struct license {
char name[128];
char address[3][128];
int zipcode;
int height, weight,month, day, year;
char license_letter;
int license_number;
};
struct license licensee;
struct license *user;
Since California driver's license numbers consist of a single character followed
by a seven digit number, the license ID is broken into two components.
Similarly, the licensee's address is broken into three lines, represented by
three arrays of 128 characters.
Accessing individual fields of a structure requires two different techniques. To
read a member of a locally defined structure, you append a dot to the variable,
then the field name. For example:
licensee.zipcode=94404;
To use a pointer, to the structure, you need -> to point to the member:
user->zipcode=94404;
Interestingly, if the structure pointer is incremented, the address is increased
not by 1, but by the size of the structure.
Functions are an easy way to group statements and to give them a name. These are
usually related statements that perform repetitive tasks such as I/O. printf,
described above, is a function. It is provided with the standard C library.
Listing 17.4 illustrates a function definition, a function call, and a function.
NOTE: The three-dot ellipsis simply means that some lines of sample code are not
shown here, in order to save space.
Listing 17.4. An example of a function.
int swapandmin( int *, int *); /* Function declaration */
...
int i,j,lower;
i=2; j=4;
lower=swapandmin(&i, &j); /* Function call */
...
int swapandmin(int *a,int *b) /* Function definition */
{
int tmp;
tmp=(*a);
(*a)=(*b);
(*b)=tmp;
if ((*a)<(*b)) return(*a);
return(*b);
}
ANSI C and K&R differ most in function declarations and calls. ANSI requires
that function arguments be prototyped when the function is declared. K&R
required only the name and the type of the returned value. The declaration in
Listing 17.4 states that a function swapandmin will take two pointers to
integers as arguments and that it will return an integer. The function call
takes the addresses of two integers and sets the variable named lower with the
return value of the function.
When a function is called from a C program, the values of the arguments are
passed to the function. Therefore, if any of the arguments will be changed for
the calling function, you can't pass only the variable—you must pass the
address, too. Likewise, to change the value of the argument in the calling
routine of the function, you must assign the new value to the address.
In the function in Listing 17.4, the value pointed to by a is assigned to the
tmp variable. b is assigned to a, and tmp is assigned to b. *a is used instead
of a to ensure that the change is reflected in the calling routine. Finally, the
values of *a and *b are compared, and the lower of the two is returned.
If you included the line
printf("%d %d %d",lower,i,j);
after the function call, you would see 2 4 2 on the output.
This sample function is quite simple, and it is ideal for a macro. A macro is a
technique used to replace a token with different text. You can use macros to
make code more readable. For example, you might use EOF instead of (-1) to
indicate the end of a file. You can also use macros to replace code. Listing
17.5 is the same as Listing 17.4 except that it uses macros.
Listing 17.5. An example of macros.
#define SWAP(X,Y) {int tmp; tmp=X; X=Y; Y=tmp; }
#define MIN(X,Y) ((X<Y) ? X : Y )
...
int i,j,lower;
i=2; j=4;
SWAP(i,j);
lower=MIN(i,j);
When a C program is compiled, macro replacement is one of the first steps
performed. Listing 17.6 illustrates the result of the replacement.
Listing 17.6. An example of macro replacement.
int i,j,lower;
i=2; j=4;
{int tmp; tmp=i; i=j; j=tmp; };
lower= ((i<j) ? i : j );
The macros make the code easier to read and understand.
Creating a Simple Program
For your first program, write a program that prints a chart of the first ten
integers and their squares, cubes, and square roots.
Writing the Code
Using the text editor of your choice, enter all the code in Listing 17.7 and
save it in a file called sample.c.
Listing 17.7. Source code for sample.c.
#include <stdio.h>
#include <math.h>
main()
{
int i;
double a;
for(i=1;i<11;i++)
{
a=i*1.0;
printf("%2d. %3d %4d %7.5f\n",i,i*i,i*i*i,sqrt);
}
}
The first two lines are header files. The stdio.h file provides the function
definitions and structures associated with the C input and output libraries. The
math.h file includes the definitions of mathematical library functions. You need
it for the square root function.
The main loop is the only function that you need to write for this example. It
takes no arguments. You define two variables. One is the integer i, and the
other is a double-precision floating point number called a. You wouldn't have to
use a, but you can for the sake of convenience.
The program is a simple for loop that starts at 1 and ends at 11. It increments
i by 1 each time through. When i equals 11, the for loop stops executing. You
could have also written i<=10, because the expressions have the same meaning.
First, you multiply i by 1.0 and assign the product to a. A simple assignment
would also work, but the multiplication reminds you that you are converting the
value to a double-precision floating point number.
Next, you call the print function. The format string includes three integers of
widths 2, 3, and 4. After the first integer is printed, you print a period.
After the first integer is printed, you print a floating point number that is
seven characters wide with five digits following the decimal point. The
arguments after the format string show that you print the integer, the square of
the integer, the cube of the integer, and the square root of the integer.
Compiling the Program
To compile this program using the C compiler, enter the following command:
cc sample.c -lm
This command produces an output file called a.out. This is the simplest use of
the C compiler. It is one of the most powerful and flexible commands on a UNIX
system.
A number of different flags can change the compiler's output. These flags are
often dependent on the system or compiler. Some flags are common to all C
compilers. These are described in the following paragraphs.
The -o flag tells the compiler to write the output to the file named after the
flag. The cc -o sample sample.c command would put the program in a file named
sample.
NOTE: The output discussed here is the compiler's output, not the sample
program. Compiler output is usually the program, and in every example here, it
is an executable program.
The -g flag tells the compiler to keep the symbol table (the data used by a
program to associate variable names with memory locations), which is necessary
for debuggers. Its opposite is the -O flag, which tells the compiler to optimize
the code—that is, to make it more efficient. You can change the search path for
header files with the -I flag, and you can add libraries with the -l and -L
flags.
The compilation process takes place in several steps.
First, the C preprocessor parses the file. To parse the file, it
sequentially reads the lines, includes header files, and performs macro
replacement.
The compiler parses the modified code for correct syntax. This builds a
symbol table and creates an intermediate object format. Most symbols have
specific memory addresses assigned, although symbols defined in other
modules, such as external variables, do not.
The last compilation stage, linking, ties together different files and
libraries and links the files by resolving the symbols that have not been
resolved yet.
Executing the Program
The output from this program appears in Listing 17.8.
Listing 17.8. Output from the sample.c program.
$ sample.c
1. 1 1 1.00000
2. 4 8 1.41421
3. 9 27 1.73205
4. 16 64 2.00000
5. 25 125 2.23607
6. 36 216 2.44949
7. 49 343 2.64575
8. 64 512 2.82843
9. 81 729 3.00000
10. 100 1000 3.16228
NOTE: To execute a program, just type its name at a shell prompt. The output
will immediately follow.
Building Large Applications
C programs can be broken into any number of files, so long as no function spans
more than one file. To compile this program, you compile each source file into
an intermediate object before you link all the objects into a single executable.
The -c flag tells the compiler to stop at this stage. During the link stage, all
the object files should be listed on the command line. Object files are
identified by the .o suffix.
Making Libraries with ar
If several different programs use the same functions, they can be combined in a
single library archive. The ar command is used to build a library. When this
library is included on the compile line, the archive is searched to resolve any
external symbols. Listing 17.9 shows an example of building and using a library.
Listing 17.9. Building a large application.
cc -c sine.c
cc -c cosine.c
cc -c tangent.c
ar c libtrig.a sine.o cosine.o tangent.o
cc -c mainprog.c
cc -o mainprog mainprog.o libtrig.a
Building Large Applications with make
Of course, managing the process of compiling large applications can be
difficult. UNIX provides a tool that takes care of this for you. make looks for
a makefile, which includes directions for building the application.
You can think of the makefile as being its own programming language. The syntax
is
target: dependencies
Commandlist
Dependencies can be targets declared elsewhere in the makefile, and they can
have their own dependencies. When a make command is issued, the target on the
command line is checked; if no targets are specified on the command line, the
first target listed in the file is checked.
When make tries to build a target, first the dependencies list is checked. If
any of them requires rebuilding, it is rebuilt. Then, the command list specified
for the target itself is executed.
make has its own set of default rules, which are executed if no other rules are
specified. One rule specifies that an object is created from a C source file
using $(cc) $(CFLAGS) -c (source file). CFLAGS is a special variable; a list of
flags that will be used with each compilation can be stored there. These flags
can be specified in the makefile, on the make command line, or in an environment
variable. make checks the dependencies to determine whether a file needs to be
made. It uses the mtime field of a file's status. If the file has been modified
more recently than the target, the target is remade.
Listing 17.10 shows an example of a makefile.
Listing 17.10. An example of a makefile.
CFLAGS= -g
igfl: igfl.o igflsubs.o
cc -g -o igfl igfl.o igflsubs.o -lm
igflsubs.o: igfl.h
clean:
rm -f *.o
Listing 17.10 uses several targets to make a single executable called igfl. The
two C files are compiled into objects by implicit rules. Only igflsubs.o is
dependent on a file, igfl.h. If igfl.h has been modified more recently than
igflsubs.o, a new igfl.o is compiled.
Note that there is a target called clean. Because there are no dependencies, the
command is always executed when clean is specified. This command removes all the
intermediate files. Listing 17.11 shows the output of make when it is executed
for the first time.
Listing 17.11. Output of make.
cc -g -target sun4 -c igfl.c
cc -g -target sun4 -c igflsubs.c
cc -g -o igfl igfl.o igflsubs.o -lm
Debugging Tools
Debugging is a science and an art unto itself. Sometimes, the simplest tool—the
code listing—is best. At other times, however, you need to use other tools.
Three of these tools are lint, prof, and sdb. Other available tools include
escape, cxref, and cb. Many UNIX commands have debugging uses.
lint is a command that examines source code for possible problems. The code
might meet the standards for C and compile cleanly, but it might not execute
correctly. Two things checked by lint are type mismatches and incorrect argument
counts on function calls. lint uses the C preprocessor, so you can use similar
command-like options as you would use for cc.
The prof command is used to study where a program is spending its time. If a
program is compiled and linked with -p as a flag, when it executes, a mon.out
file is created with data on how often each function is called and how much time
is spent in each function. This data is parsed and displayed with prof. An
analysis of the output generated by prof helps you determine where performance
bottlenecks occur. Although optimizing compilers can speed your programs, this
analysis significantly improves program performance.
The third tool is sdb—a symbolic debugger. When a program is compiled with -g,
the symbol tables are retained, and a symbolic debugger can be used to track
program bugs. The basic technique is to invoke sdb after a core dump and get a
stack trace. This indicates the source line where the core dump occurred and the
functions that were called to reach that line. Often, this is enough to identify
the problem. It is not the limit of sdb, though.
sdb also provides an environment for debugging programs interactively. Invoking
sdb with a program enables you to set breakpoints, examine variable values, and
monitor variables. If you suspect a problem near a line of code, you can set a
breakpoint at that line and run the program. When the line is reached, execution
is interrupted. You can check variable values, examine the stack trace, and
observe the program's environment. You can single-step through the program,
checking values. You can resume execution at any point. By using breakpoints,
you can discover many of the bugs in your code that you've missed.
cpp is another tool that can be used to debug programs. It will perform macro
replacements, include headers, and parse the code. The output is the actual
module to be compiled. Normally, though, cpp is never executed by the programmer
directly. Instead it is invoked through cc with either a -E or -P option. -E
will put the output directly to the terminal; -P will make a file with a .i
suffix.
Summary
In this chapter, we've discussed the basics of the C language: building C
programs, running them, and debugging them. While this overview isn't enough to
make you an expert C programmer, you can now understand how programmers develop
their products. You should also be able to read a C program and know what the
program is doing.
--
隋末风云起,双龙走天下。
尽数天下英豪,唯我独尊!
※ 来源:·饮水思源站 bbs.sjtu.edu.cn·[FROM: 202.120.5.209]
--
※ 修改:.fzx 于 Aug 1 12:22:38 修改本文.[FROM: heart.hit.edu.cn]
※ 转寄:.紫 丁 香 bbs.hit.edu.cn.[FROM: chen.hit.edu.cn]
--
☆ 来源:.哈工大紫丁香 bbs.hit.edu.cn.[FROM: jmm.bbs@bbs.hit.edu.]
Powered by KBS BBS 2.0 (http://dev.kcn.cn)
页面执行时间:407.677毫秒