The ARM C Compiler
==================


Introduction
------------

This chapter is a reference guide to the ARM C cross compiler, and includes all 
the information required to make effective use of the ARM C system. It is not 
intended to be an introduction to C and does not try to teach programming in C; 
nor is it a reference manual for the C standard. These needs are addressed by 
widely available textbooks and by the ANSI C standard (see "<Recommended Texts>
").

The ARM instruction set is documented separately in ARM data sheets. Reference 
should be made to the data sheet relating to the ARM variant that is being 
used.

Using the ARM assembler is documented in "<The ARM Assembler (armasm)>" 
starting on page16 of the User Manual, and ARM assembly language is documented 
in "<ARM Assembly Language>" of the User Manual. However, if 
it is necessary only to understand the assembly language output by the C 
compiler, the data sheet for the ARM CPU being used should prove sufficient. 


About the ARM C Compiler
------------------------

The ARM C compiler is a mature, industrial-strength compiler, based on Codemist 
Limited's multi-target, multi-language compiler suite (also known as the 
<NorCroft> C compiler). Derived compilers are used by, or distributed by, 
amongst others: 

 *  Advanced RISC Machines (for the ARM processor);

 *  Acorn Computers (for their ARM-based personal workstations); 

 *  INMOS (for the Transputer); 

 *  Hitachi (internal use on IBM 370 compatibles); 

 *  Perihelion Software (for their <Helios> portable operating system).

By default, the ARM C Compiler compiles ANSI C as defined by <American National 
Standard for Information Systems - Programming Language C>, X3J11/90-013, Feb 
14, 1990.

The compiler also has a <pcc> mode, which accepts the dialect of C used by 
Berkeley Unix. In this mode, the compiler has been used to build a complete 
ARM-based BSD Unix system (the RISCiX system, marketed by Acorn Computers 
Limited, which has also achieved X/Open branding).

Pcc mode is selectable from the compiler's command line.

In its ANSI mode, the ARM C compiler has been tested against release 2.00 of 
the Plum-Hall C Validation Suite (CVS), which is widely considered to be the 
toughest C test suite available, and has been adopted by the British Standards 
Institute for C compiler Validation in Europe. In the language conformance 
sections of the CVS, it fails in only two trivial ways, both failures to 
produce required diagnostics:

 *  An empty initialiser for an aggregate of complete type is not diagnosed, 
    e.g.

          int x[3] = {};

 *  Signed integer constant overflow is not diagnosed, but merely warned of, 
    e.g.

          case INT_MAX+1: ...

Wherever possible, the ARM C compiler adopts widely used command-line options 
which should be familiar to users of both Unix and DOS.


Recommended Texts
-----------------

As a guide to programming in C we recommend:

 *  Harbison, S.P. and Steele, G.L., <A C Reference Manual>, (second edition, 
    1987). Prentice-Hall, Englewood Cliffs, NJ, USA. ISBN 0-13-109802-0.

 *  Kernighan, B.W. and Ritchie, D.M.,<The C Programming Language> (second 
    edition, 1988). Prentice-Hall, Englewood Cliffs, NJ, USA. ISBN 
    0-13-110362-8.

Harbison and Steele is a very thorough reference guide to C, including a useful 
amount of information on ANSI C.

Kernighan and Ritchie is the original C bible, updated to cover the essentials 
of ANSI C. Many users prefer it to Harbison and Steele.

Because the ARM C compiler is a compiler for ANSI C, these books are especially 
relevant but, in each case, the second edition must be obtained for coverage of 
ANSI C. Note also that because the ANSI standard, formally adopted in February 
1990, is identical to the December 1988 draft standard, both of these texts are 
reasonably up to date.

We also recommend:

 *  Koenig, A, (1989),<C Traps and Pitfalls>, Addison-Wesley, Reading, Mass. 
    SBN 0-201-17928-8.

which explains in impressively few pages how to avoid the most common traps and 
pitfalls that ensnare even the most experienced C programmers. It provides 
informative reading at all levels of competence in C.

Finally, the definitive ANSI C reference is:

 *  <American National Standard for Information Systems - Programming Language 
    C>, X3J11/90-013, Feb. 14, 1990.

In the United Kingdom, the standard is available from:

    British Standards Institution,
    Foreign Sales Department, 
    Linford Wood
    Milton Keynes MK14 6LE

(Members of the BSI can order copies by telephone; non-members should send a 
cheque payable to BSI).

In other countries the standard is available from the national standards body 
(e.g. <AFNOR> in France, <ANSI> in the USA). In Europe, the standard may be 
more readily (and more cheaply) available as the corresponding draft ISO 
standard.

However, the coverage of ANSI C in this chapter, and in the books listed above, 
should be adequate for all but the most demanding requirements.


Using the ARM C Compiler
------------------------

For a description of how to invoke the ARM C Compiler see "<The ARM C Compiler>
" of the User manual, which includes full details of <armcc> 
command line options.


File Naming Conventions 
........................

The ARM C system uses suffix naming conventions to identify the classes of file 
involved in the compilation and linking processes:

    suffix        usage

    .c            C source file

    .h            C header file

    .o            ARM object file

    .s            ARM assembly language

    .lst          compiler listing file

For example, <something.c> names the C source of <something>.

Many host systems support suffix file naming conventions (Unix, MS-DOS, and the 
Macintosh under MPW all do), so the names used by the C system on the command 
line, and as arguments to the C preprocessor directive #include, map directly 
to host file names.

Some host systems have no file name extensions and no extension convention. On 
such systems, files may be stored in folders (sub-directories) named c, h, o 
and s. However, the compiler still understands the <something.c> notation, both 
on its command line and when processing the names of #include files, and it 
translates names written in standard form to host system file names. For 
example, under Acorn's RISC OS system the source <sonething.c> is actually 
stored in the file called <something> in subdirectory <c> .  Note, however, 
that under RISC OS listing files are by default created in an <l> directory, 
not a <lst> directory as might be expected.

Portability is an increasingly important issue in the C world, especially since 
the standardisation of C. To this end, the ARM C system provides support for 
the use of multiple file-naming conventions on one host.

In each environment the ARM C system supports:

 *  native file names;

 *  pseudo Unix file names;

 *  Unix file names.

A pseudo Unix file name is one in the format:

    <host-volume-name>:/<rest-of-unix-file-name>

Determining how to parse a name is done heuristically.  Heuristics are applied 
as follows:

 *  A name starting with <volume-name>:/ is a pseudo Unix file name;

 *  Under RISC OS, a name starting with: [<filing-system>:]:<mount>, $ or & is 
    a RISC OS file name;

 *  A name containing a '/' is a unix file name;

Otherwise the name is a host name.

Of course, such file name interpretation only has a chance of success if 
certain rules are adhered to by program authors. For example, under DOS, a name 
may not exceed 8 characters in length and character case is not significant. In 
general, portability is best served if the name of a file or directory is 
restricted to a maximum of 8 lower-case letters and digits, beginning with a 
letter; extensions should be no longer than 3 letters and digits long; and 
embedded path names should be relative, rather than absolutely rooted.


Filename Validity 
..................

The compiler does not check whether the file names given are acceptable to the 
host's filing system. If the file name is not acceptable the compiler will 
report that it could not be opened, but will give no further diagnosis.


Object Files 
.............

By default, the object file(s) created by the compiler are stored in the 
current directory.

A C source file (<something.c>) is compiled into an object file (usually called 
<something.o>) written in ARM Object Format (AOF). AOF is defined in "<ARM 
Object Format>" of the Technical Specifications.


Included Files 
...............

During a compilation, the compiler may read included header files, 
conventionally given a '.h' suffix, or included C source files, usually given a 
'.c' suffix like the initial C source file.

A special feature of the ARM C system is that the ANSI library headers are 
built into the C compiler (in a special, textually-compressed, in-memory filing 
system) and are used from there by default. By placing a file name in angle 
brackets, you indicate that the included file is a system file and ensure that 
the compiler looks first in its built-in filing system. For example:

    #include <stdio.h>

By default, the ARM C compiler <does not> look for system files in the current 
directory.

By enclosing a file name in double quotes in its #include directive, you 
indicate that it is not a system file. For example:

    #include "myfile.h"

By default, the ARM C compiler <does> look for non-system files in the current 
directory.

The way the compiler looks for included files depends on three factors:

 *  whether the file name is rooted (ie. it is an absolute file name, rather 
    than a relative file name);

 *  whether the file name in the #include directive is between angle brackets 
    or double quotes;

 *  use of the -I and -j flags and the special file name :mem.


The Search Path 

The order of directories on the search path is as follows:

 *  the compiler's own in-memory filing system (for filenames enclosed in angle 
    brackets, but only if the -j flag is not used);

 *  the current place (see <The Current Place>, below) (not for filenames 
    enclosed in angle brackets);

 *  arguments to -I flags, if used (for filenames enclosed in angle brackets or 
    double quotes);

 *  arguments to the -j flag, if used (for filenames enclosed in angle brackets 
    or double quotes);

 *  the compiler's own in-memory filing system (for filenames enclosed in angle 
    brackets, but only if the -j flag is used).

Note that the in-memory filing system can be specified explicitly by -I or -j 
by using the directory name <:mem>.


The Current Place 

The current place is the directory containing the source file (C source or 
#include header) currently being processed by the compiler. Often, this will be 
the current directory.

When a file is found relative to an element of the search path, the name of the 
directory containing that file becomes the new current place. When the compiler 
has finished processing that file it restores the old current place. At each 
instant, there is a stack of current places corresponding to the stack of 
nested #include's.

For example, suppose the current place is /me/include and the compiler is 
seeking the #include file "sys/defs.h". Now suppose this is found as /me
/include/sys/defs.h. Then the new current place is /me/include/sys and any file 
#include by defs.h whose name is not rooted, will be sought relative to /me
/include/sys.

This is the search rule used by BSD Unix systems. If required, the stacking of 
current places using the compiler option -fK can be disabled, to get the search 
rule described originally by Kernighan and Ritchie in <The C Programming 
Language.> Then all non-rooted user includes are sought relative to the 
directory containing the source file being compiled.


Implementation Details 
-----------------------

This section gives details of those aspects of the compiler and C library which 
the ANSI standard for C identifies as implementation-defined, together with 
some other points of interest to programmers. The material is collected here by 
subject.

The later section "<Implementation Details>" lists the 
points which a conforming implementation must document, as set out in appendix 
A.6.3 of the ANSI C standard. The material of this section is organised to 
follow directly the structure of appendix A.6.3.

Note that there is significant overlap between this section and the following 
one.


Character Sets and Identifiers 
...............................

An identifier can be of any length. The compiler truncates an identifier after 
256 characters, all of which are significant (the standard requires a minimum 
of 31 significant characters).

The source character set expected by the compiler is 7-bit ASCII, except that 
within comments, string literals, and character constants, the full ISO 8859-1 
(Latin 1) 8-bit character set is recognised.

In its generic configuration, as delivered, the C library processes the full 
ISO 8859-1 (Latin-1) 8-bit character set, save that the default locale is the C 
locale (see "<Standard Implementation Definition>"). The 
<ctype> functions therefore all return 0 when applied to codes in the range 160 
to 255. By calling <setlocale(LC_CTYPE, "ISO8859-1")> you can cause the <isupper> 
and <islower> functions to behave as expected over the full 8-bit Latin-1 
alphabet, rather than just over the 7-bit ASCII subset.

Upper and lower case characters are distinct in all identifiers, both internal 
and external.

In pcc mode (-pcc option) and "limited pcc" or "system programming" mode (-fc 
option) an identifier may also contain a dollar character.


Data Elements 
..............

The sizes of data elements are as follows:

    Type                Size in bits

    char                8

    short               16

    int                 32

    long                32

    float               32

    double              64

    long double         64    (subject to future change)

    all pointers        32

Integers are represented in two's complement form.

Data items of type char are unsigned by default, though in ANSI mode they may 
be explicitly declared as signed char or unsigned char

In the compiler's -pcc mode there is no <signed> keyword, so chars are signed 
by default and may be declared unsigned if required.

Floating point quantities are stored in the IEEE format. In double and long 
double quantities, the word containing the sign, the exponent and the most 
significant part of the mantissa is stored at the lower machine address.


Arithmetic Limits (limits.h and float.h)
........................................

The ANSI C standard defines two header files, limits.h and float.h, which 
contain constants describing the ranges of values which can be represented by 
the arithmetic types. The standard also defines minimum values for many of 
these constants.

This subsection sets out the values in these two headers for the ARM, and gives 
a brief description of their significance.

Number of bits in the smallest object that is not a bit field (i.e. a byte):

    CHAR_BIT 8

Maximum number of bytes in a multibyte character, for any supported locale:

    MB_LEN_MAX 1

For the following integer ranges, the middle column gives the numerical value 
of the range's endpoint, while the right hand column gives the bit pattern (in 
hexadecimal) that would be interpreted as this value in ARM C. When entering 
constants you must be careful about the size and signed-ness of the quantity. 
Furthermore, constants are interpreted differently in decimal and 
hexadecimal/octal. See the ANSI C standard or any of the recommended textbooks 
on the C programming language for more details.

    Range                 End-point             Hex Representation
    
    CHAR_MAX                   255                    0xff
    CHAR_MIN                     0                    0x00
    
    SCHAR_MAX                  127                    0x7f
    SCHAR_MIN                 -128                    0x80
    
    UCHAR_MAX                  255                    0xff
    
    SHRT_MAX                 32767                  0x7fff
    SHRT_MIN                -32768                  0x8000
    
    USHRT_MAX                65535                  0xffff
    INT_MAX             2147483647              0x7fffffff
    INT_MIN            -2147483648              0x80000000
    
    LONG_MAX            2147483647              0x7fffffff
    LONG_MIN           -2147483648              0x80000000
    
    ULONG_MAX           4294967295              0xffffffff

Characteristics of floating point:

    FLT_RADIX         2
    FLT_ROUNDS        1

The base (radix) of the ARM floating-point number representation is 2, and 
floating-point addition rounds to nearest.

Ranges of floating types:

    FLT_MAX           3.40282347e+38F
    FLT_MIN           1.17549435e-38F
    
    DBL_MAX           1.79769313486231571e+308
    DBL_MIN           2.22507385850720138e-308
    
    LDBL_MAX          1.79769313486231571e+308
    LDBL_MIN          2.22507385850720138e-308

Ranges of base two exponents:

    FLT_MAX_EXP          128
    FLT_MIN_EXP        (-125)
    
    DBL_MAX_EXP         1024
    DBL_MIN_EXP       (-1021)
    
    LDBL_MAX_EXP        1024
    LDBL_MIN_EXP      (-1021)

Ranges of base ten exponents:

    FLT_MAX_10_EXP        38
    FLT_MIN_10_EXP      (-37)
    
    DBL_MAX_10_EXP       308
    DBL_MIN_10_EXP     (-307)
    
    LDBL_MAX_10_EXP      308
    LDBL_MIN_10_EXP    (-307)

Decimal digits of precision:

    FLT_DIG                6
    DBL_DIG               15
    LDBL_DIG              15

Digits (base two) in mantissa (binary digits of precision):

    FLT_MANT_DIG          24
    DBL_MANT_DIG          53
    LDBL_MANT_DIG         53

Smallest positive values such that (1.0 + x != 1.0):

    FLT_EPSILON            1.19209290e-7F
    DBL_EPSILON            2.2204460492503131e-16
    LDBL_EPSILON           2.2204460492503131e-16L


Structured Data Types 
......................

The ANSI C standard leaves details of the layout of the components of a 
structured data type to each implementation. The following points apply to the 
ARM C compiler:

 *  Structures are aligned on word boundaries.

 *  Structures are arranged with the first-named component at the lowest 
    address.

 *  A component with a char type is packed into the next available byte.

 *  A component with a short type is aligned to the next even-addressed byte.

 *  All other arithmetic-type components are word-aligned, as are pointers and 
    integers containing bitfields.

 *  The only valid types for bitfields are (signed) int and unsigned int. (In 
    -pcc mode, char, unsigned char, short, unsigned short, long and unsigned 
    long are also accepted).

 *  A bitfield of type int is treated as unsigned by default (signed by default 
    in -pcc mode).

 *  A bitfield must be wholly contained within the 32 bits of an int.

 *  Bitfields are allocated within words so that the first field specified 
    occupies the lowest-addressed bits of the word (when configured "
    little-endian", lowest addressed means least significant; when configured "
    big-endian", lowest addressed means most significant).


Pointers
........

The following remarks apply to pointer types:

 *  Adjacent bytes have addresses which differ by one.

 *  The macro NULL expands to the value 0.

 *  Casting between integers and pointers results in no change of 
    representation.

 *  The compiler warns of casts between pointers to functions and pointers to 
    data (but not in -pcc mode).


Pointer Subtraction
...................

When two pointers are subtracted, the difference is obtained as if by the 
expression:

    ((int)a - (int)b) / (int)sizeof(<type pointed to>)

If the pointers point to objects whose size is no greater than four bytes, the 
alignment of data ensures that the division will be exact in all cases. For 
longer types, such as doubles and structures, the division may not be exact 
unless both pointers are to elements of the same array. Moreover the quotient 
may be rounded up or down at different times, leading to potential 
inconsistencies.


Arithmetic Operations
.....................

The compiler performs the usual arithmetic conversions set out in the ANSI C 
standard. The following points apply to operations on the integral types:

 *  All signed integer arithmetic uses a two's complement representation.

 *  Bitwise operations on signed integral types follow the rules which arise 
    naturally from two's complement representation.

 *  Right shifts on signed quantities are arithmetic.

 *  Any quantity which specifies the amount of a shift is treated as an 
    unsigned 8-bit value.

 *  Any value to be shifted is treated as a 32-bit value.

 *  Left shifts of more than 31 give a result of zero.

 *  Right shifts of more than 31 give a result of zero from a shift of an 
    unsigned or positive signed value; they yield -1 from a shift of a negative 
    signed value.

 *  The remainder on integer division has the same sign as the divisor.

 *  If a value of integral type is truncated to a shorter signed integral type, 
    the result is obtained as if by masking the original value to the length of 
    the destination, and then sign extending.

 *  A conversion between integral types never causes an exception to be raised.

 *  Integer overflow does not cause an exception to be raised.

 *  Integer division by zero does cause an exception to be raised.

The following points apply to operations on floating point types:

 *  When a double or long double is converted to a float, rounding is to the 
    nearest representable value.

 *  A conversion from a floating type to an integral type causes an exception 
    to be raised only if the value cannot be represented in a long int, (or 
    unsigned long int in the case of conversion to an unsigned int).

 *  Floating point underflow is not detected; any operation which underflows 
    returns zero.

 *  Floating point overflow causes an exception to be raised.

 *  Floating point divide by zero causes an exception to be raised.


Expression Evaluation 
......................

The compiler performs the usual arithmetic conversions (<promotions>) set out 
in the ANSI C standard before evaluating an expression. The following should be 
noted:

 *  The compiler may re-order expressions involving only associative and 
    commutative operators of equal precedence, even in the presence of 
    parentheses (e.g. a + (b - c) may be evaluated as (a + b) - c).

 *  Between sequence points, the compiler may evaluate expressions in any 
    order, regardless of parentheses. Thus the side effects of expressions 
    between sequence points may occur in any order.

 *  Similarly, the compiler may evaluate function arguments in any order.

Any detail of order of evaluation not prescribed by the ANSI C standard may 
vary between releases of the ARM C compiler.


Implementation Limits 
......................

The ANSI C standard sets out certain minimum limits which a conforming compiler 
must accept; a user should be aware of these when porting applications between 
compilers. A summary is given here. The <mem> limit indicates that no limit is 
imposed by the ARM C compiler other than that imposed by the availability of 
memory.

    Description                                 Requirement         ARM C

    Nesting levels of compound statements and
        iteration/selection control structures          15          mem
    Nesting levels of conditional compilation            8          mem
    Declarators modifying a basic type                  31          mem
    Expressions nested by parentheses                   32          mem
    Significant characters
        in internal identifiers and macro names         31          256
        in external identifiers                          6          256
    External identifiers in one source file            511          mem
    Identifiers with block scope in one block          127          mem
    Macro identifiers in one source file              1024          mem
    Parameters in one function definition/call          31          mem
    Parameters in one macro definition/invocation       31          mem
    Characters in one logical source line              509        no limit
    Characters in a string literal                     509          mem
    Bytes in a single object                         32767          mem/[*]
    Nesting levels for #included files                   8          mem
    Case labels in a switch statement                  257          mem
    Members in a single struct or union,
        enumeration constants in a single enum         127          mem
    Nesting of struct/union in a single declaration     15          mem


[*] When running on 16-bit hosts, the ARM C compiler may impose a limit on the 
size of an object. Generally, this limit will be 65535 bytes in a single <object 
file> rather than 32767 bytes in a single C-language object. 32-bit hosted 
versions do not have this limit.


Standard Implementation Definition 
-----------------------------------

This section discusses aspects of the ARM C compiler and ANSI C library not 
defined by the ANSI C standard, and which are implementation-defined.

Appendix A.6 of the ANSI C standard collects together information about 
portability issues; section A.6.3 lists those points which must be defined by 
each implementation. This section corresponds to appendix A.6.3, dealing with 
the points listed there, under the same headings and in the same order.


Translation 
............

Diagnostic messages produced by the compiler are of the form:

    "<source-file>", <line-number>: <severity>: <explanation>

where <severity> is one of:

 *  <Warning>: not a diagnostic in the ANSI sense, but an attempt by the 
    compiler to be helpful.

 *  <Error>: a violation of the ANSI specification from which the compiler was 
    able to recover by guessing the user's intentions.

 *  <Serious error>: a violation of the ANSI specification from which no 
    recovery was possible because the compiler could not reliably guess what 
    was intended.

 *  <Fatal> (for example, not enough memory): not really a diagnostic but an 
    indication that the compiler's limits have been exceeded or that the 
    compiler has detected a fault in itself.


Environment 
............

The mapping of a command line from the ARM-based environment into arguments to 
main() is implementation-specific. The generic ARM C library supports the 
following:

 *  The arguments given to <main()> are the words of the <command line> (not 
    including I/O redirections, covered below), delimited by white space, 
    except where the white space is contained in double quotes. A white space 
    character is any character of which <isspace()> is true. A double quote or 
    backslash character (\) inside double quotes must be preceded by a 
    backslash character. An I/O redirection will not be recognised inside 
    double quotes.

In an unhosted implementation of the ARM C library, the term <interactive 
device >may be meaningless. The generic ARM C library supports a pair of 
devices, both called <:tt>, intended to handle a keyboard and a VDU screen. In 
the generic implementation:

 *  No buffering is done on any stream connected to :tt unless I/O redirection 
    has taken place. If I/O redirection other than to :tt has taken place, full 
    file buffering is used except where both stdout and stderr have been 
    redirected to the same file, in which case line buffering is used.

Using the generic ARM C library, the standard input, output and error streams, 
<stdin>, <stdout>, and <stderr> can be redirected at run time in the way shown. 
For example, if <mycopy> is a program which simply copies the standard input to 
the standard output, the following line: 

    mycopy < infile > outfile 2> errfile

runs the program, redirecting stdin to the file <infile>, stdout to the file 
<outfile>, and stderr to the file <errfile>.

The following shows the allowed redirections:

    0< filename read stdin from filename
    < filename  read stdin from filename
    
    1> filename write stdout to filename
    > filename  write stdout to filename
    
    2> filename write stderr to filename
    2>&1        write stderr to same place as stdout
    
    >& filename write both stdout and stderr to filename
    >> filename append stdout to filename
    >>& filenameappend both stdout and stderr to filename


Identifiers
...........

256 characters are significant in identifiers without external linkage. Allowed 
characters are letters, digits, and underscores.

256 characters are significant in identifiers with external linkage. Allowed 
characters are letters, digits, and underscores.

Case distinctions <are> significant in identifiers with external linkage.

In pcc mode (-pcc option) and "limited pcc" or "system programming" mode (-fc 
option), the character '$' is also valid in identifiers.


Characters
..........

The characters in the source character set are assumed to be ISO 8859-1 
(Latin-1 Alphabet), a superset of the ASCII character set. The printable 
characters are those in the range 32 to 126 and 160 to 255. Any printable 
character may appear in a string or character constant, and in a comment.

Other properties of the source character set are host specific, save that the 
ARM C compiler has no support for multi-byte character sets.

The properties of the execution character set are target specific. In its 
generic form, the ARM C library supports the ISO 8859-1 (Latin-1) character 
set, so the following points are expected to hold:

 *  The execution character set is identical to the source character set.

 *  There are four chars/bytes in an int. If the ARM processor is configured to 
    operate with a <little-endian> memory system, the bytes are ordered from 
    least significant at the lowest address to most significant at the highest 
    address. If the ARM is configured to operate with a <big-endian> memory 
    system the bytes are ordered from least significant at the highest address 
    to most significant at the lowest address.

 *  A character constant containing more than one character has the type int. 
    Up to four characters of the constant are represented in the integer value. 
    The first character in the constant occupies the lowest-addressed byte of 
    the integer value; up to three following characters are placed at ascending 
    addresses. Unused bytes are filled with the NUL (or '\0') character.

 *  There are eight bits in a character in the execution character set.

 *  All integer character constants that contain a single character or 
    character escape sequence are represented in both the source and execution 
    character sets (by an assumption which may be violated in any given 
    retargeting of the generic ARM C library).

 *  Characters of the source character set in string literals and character 
    constants map identically into the execution character set (by an 
    assumption which may be violated in any given retargeting of the generic 
    ARM C library).

 *  No locale is used to convert multibyte characters into the corresponding 
    wide characters (codes) for a wide character constant (not relevant to the 
    generic implementation).

 *  A plain char is treated as unsigned (but as signed in -pcc mode).

 *  The character escape codes are:
    Escape sequence           Char value        Description
    \a                               7          Attention (bell)
    \b                               8          Backspace
    \f                               9          Form feed
    \n                              10          Newline
    \r                              11          Carriage return
    \t                              12          Tab
    \v                              13          Vertical tab
    \xnn                          0xnn          ASCII code in hexadecimal
    \nnn                          0nnn          ASCII code in octal


Integers
........

The representations and sets of values of the integral types are set out in "
<Data Elements>". Note also:

 *  The result of converting an integer to a shorter signed integer (if the 
    value cannot be represented) is as if the bits in the original value which 
    cannot be represented in the final value are masked out, and the resulting 
    integer sign-extended. The same applies when an unsigned integer is 
    converted to a signed integer of equal length.

 *  Bitwise operations on signed integers yield the expected result given two's 
    complement representation. No sign extension takes place.

 *  The sign of the remainder on integer division is the same as defined for 
    the function <div()>.

 *  Right shift operations on signed integral types are arithmetic.


Floating point
..............

The representations and ranges of values of the floating point types have been 
given in "<Data Elements>".  Note also that:

 *  When a floating point number is converted to a shorter floating point one, 
    it is rounded to the nearest representable number.

 *  The properties of floating point arithmetic accord with IEEE 754.


Arrays and pointers 
....................

The ANSI draft standard specifies three areas in which the behaviour of arrays 
and pointers must be documented. The points to note are:

 *  The type size_t is unsigned int (signed int in -pcc mode).

 *  Casting pointers to integers and vice versa involves no change of 
    representation.

 *  The type ptrdiff_t is defined as (signed int).


Registers 
..........

Using the ARM C compiler, you can declare any number of objects to have the 
storage class <register>. Depending on which variant of the ARM Procedure Call 
Standard is in use, there are between five and seven registers available. 
Declaring more than this number of objects with register storage class must 
result in at least one of them not being held in a register. In general, it is 
advisable to declare no more than four. The valid types are:

 *  any integer type;

 *  any pointer type;

 *  any integer-like structure (any one word struct or union in which all 
    addressable fields have the same address or any one word structure 
    containing only bitfields).

Note that other variables, not declared with the register storage class, may be 
held in registers for extended periods, and that register variables may be held 
in memory for some periods.

Note also that there is a #pragma which assigns a file-scope variable to a 
specified register everywhere within a compilation unit.


Structures, Unions, Enumerations and Bitfields 
...............................................

The ARM C compiler handles structures in the following way:

 *  When a member of a union is accessed using a member of a different type, 
    the resulting value can be predicted from the representation of the 
    original type. No error is given.

 *  Structures are aligned on word boundaries. Characters are aligned in bytes, 
    shorts on even numbered byte boundaries, and all other types, except 
    bitfields, are aligned on word boundaries. Bitfields are subfields of ints, 
    themselves aligned on word boundaries.

 *  A plain bitfield (declared as int) is treated as unsigned int (signed int 
    in -pcc mode).

 *  A bitfield which does not fit into the space remaining in the current int 
    is placed in the next int.

 *  The order of allocation of bitfields within ints is such that the first 
    field specified occupies the lowest-addressed bits of the word.

 *  Bitfields do not straddle storage unit (int) boundaries.

 *  The integer type chosen to represent the values of an enumeration type is 
    <int>.


Qualifiers
..........

An object that has volatile-qualified type is <accessed> if any word or byte of 
it is read or written. For volatile-qualified objects, reads and writes occur 
as directly implied by the source code, in the order implied by the source 
code.

The effect of accessing a volatile-qualified short is undefined.


Declarators
...........

The number of declarators that may modify an arithmetic, structure or union 
type is limited only by available memory.


Statements
..........

The number of case values in a switch statement is limited only by memory.


Preprocessing directives
........................

A single-character constant in a preprocessor directive cannot have a negative 
value.

The ANSI standard header files are contained within the compiler itself and may 
be referred to in the way described in the standard (using, for example, 
#include <stdio.h>, etc.).

Quoted names for includable source files are supported. The rules for directory 
searching are given in "<Included Files>". The compiler will 
accept host file names or Unix file names. In the latter case, on non-Unix 
hosts, the compiler does its best to translate the file name to a local 
equivalent.  See "<File Naming Conventions>" for more 
details.

The recognized #pragma directives and their meanings are described in "<Pragma 
Directives>".

The date and time of translation are always available, so __DATE__ and __TIME__ 
always give respectively the date and time.


Library functions
.................

The precise attributes of a C library are specific to a particular 
implementation of it. The generic ARM C library has or supports the following 
features:

 *  The macro NULL expands to the integer constant 0.

 *  If a program redefines a reserved external identifier, then an error may 
    occur when the program is linked with the standard libraries. If it is not 
    linked with standard libraries, then no error will be detected.

 *  The <assert()> function prints the following message and then calls the 
    <abort()> function:

       *** assertion failed: <expression>, file <file-name>, line <line-number>

 *  The functions: <isalnum()>, <isalpha()>, <iscntrl()>, <islower()>, 
    <isprint()>, <isupper()>, and <ispunct()> usually test only for characters 
    whose values are in the range 0 to 127 (inclusive). Characters with values 
    greater than 127 return a result of 0 for all of these functions, except 
    <iscntrl()> which returns non-zero for 0 to 31, and 128 to 255.

After the call setlocale(LC_CTYPE, "ISO8859-1"), the following statements also 
apply to character codes and affect the results returned by the <ctype> 
functions:

 *  codes 128-159 are control characters;

 *  codes 192 to 223 except 215 are upper case;

 *  codes 224 to 255 except 247 are lower case;

 *  code 160 to 191, 215 and 247 are punctuation.

The mathematical functions return the following values on domain errors:

    Function          Condition         Returned value
    
    log(x)            x <= 0            -HUGE_VAL
    log10(x)          x <= 0            -HUGE_VAL
    sqrt(x)           x < 0             -HUGE_VAL
    atan2(x,y)        x = y = 0         -HUGE_VAL
    asin(x)           abs(x) > 1        -HUGE_VAL
    acos(x)           abs(x) > 1        -HUGE_VAL

Where <-HUGE_VAL> is written above, a number is returned which is defined in 
the header math.h. Consult the errno variable for the error number.

The mathematical functions set errno to ERANGE on underflow range errors.

A domain error occurs if the second argument of fmod is zero, and HUGE_VAL is 
returned.

The set of signals for the generic <signal()> function is as follows: 

    SIGABRT     Abort
    SIGFPE      Arithmetic exception
    SIGILL      Illegal instruction
    SIGINT      Attention request from user
    SIGSEGV     Bad memory access
    SIGTERM     Termination request
    SIGSTAK     Stack overflow

The default handling of all recognised signals is to print a diagnostic message 
and call exit. This default behaviour applies at program start-up.

When a signal occurs, if func points to a function, the equivalent of 
signal(sig, SIG_DFL) is first executed.

If the SIGILL signal is received by a handler specified to the signal function, 
the default handling is reset.

The generic ARM C library also has the following characteristics relating to 
I/O, (of course, any particular targeting of it may not have):

 *  The last line of a text stream does not require a terminating newline 
    character.

 *  Space characters written out to a text stream immediately before a newline 
    character do appear when read back in.

 *  No null characters are appended to a binary output stream.

 *  The file position indicator of an append mode stream is initially placed at 
    the end of the file.

 *  A write to a text stream does not cause the associated file to be truncated 
    beyond that point (device dependent).

 *  The characteristics of file buffering are as intended by section 4.9.3 of 
    the ANSI C standard.

 *  A zero-length file (on which no characters have been written by an output 
    stream) does (is intended to) exist.

 *  The same file can be opened many times for reading, but only once for 
    writing or updating. A file cannot be open for reading on one stream and 
    for writing or updating on another.

 *  Local time zones and Daylight Saving Time are not implemented. The values 
    returned will always indicate that the information is not available.

 *  <fprintf()> prints %p arguments in hexadecimal format (lower case) as if a 
    precision of 8 had been specified. If the variant form (%#p) is used, the 
    number is preceded by the character '@'.

 *  <fscanf()> treats %p arguments identically to %x arguments.

 *  <fscanf()> always treats the character '-' in a %...[...] argument as a 
    literal character.

 *  <ftell()> and <fgetpos()> set errno to the value of EDOM on failure.

 *  <perror()> generates the following messages:
    
    Error         Message
    
    0             No error (errno = 0)
    EDOM          EDOM - function argument out of range
    ERANGE        ERANGE - function result not representable
    ESIGNUM       ESIGNUM - illegal signal number to signal() or raise()
    others        Error code number has no associated message

 *  <calloc()>, <malloc()> and <realloc()>, if the size of area requested is 
    zero, return NULL.

 *  <abort()> closes all open files, and deletes all temporary files.

 *  The status returned by <exit()> is the same value that was passed to it. 
    For definitions of EXIT_SUCCESS and EXIT_FAILURE refer to the header file 
    stdlib.h

 *  The error messages returned by the <strerror()> function are identical to 
    those given by the <perror()> function.

The following characteristics, required to be specified in an ANSI-compliant 
implementation, are unspecified in the generic ARM C library:

 *  The validity of a file name.

 *  Whether <remove()> can remove an open file.

 *  The effect of calling the <rename()> function when the new name already 
    exists.

 *  The effect of calling <getenv()> (the default is to return NULL - no value 
    available).

 *  The effect of calling <system()>.

 *  The value returned by <clock()>.


Portability
-----------

The C programming language has gained a reputation for being portable across 
machines, while still providing machine-specific capabilities. However, the 
fact that a program is written in C gives little indication of the effort 
required to port it from one machine to another or, indeed, from one C system 
to another.

Obviously the most effort-consuming task is porting between two entirely 
different hardware environments, running different operating systems with 
different compilers. Because many users of the ARM C compiler will face just 
this situation, this section deals with the issues that the user should be 
aware of when porting software to or from the ARM C system environment. In 
outline:

 *  general portability considerations;

 *  the differences between ANSI C and the well-known K&R C as defined in the 
    book <The C Programming Language>, (first edition) by Kernighan and 
    Ritchie;

 *  using the ARM C compiler in -pcc compatibility mode;

 *  environmental aspects of portability.

In addition, the tool <topcc> is supplied as part of the ARM Software 
Development Toolkit. <topcc> translates ANSI C to PCC style C.  For details of 
the <topcc> tool refer to "<The ANSI C to PCC C Translator (topcc)>" starting 
on page42 of the User Manual.

If code is to be used on a variety of different systems, there are certain 
issues that should be borne in mind to make porting an easy and relatively 
error-free process. It is essential to identify practices which may make 
software system-specific, and to avoid them. In the remainder of this section, 
we document the general portability issues for C programs.


Fundamental Data Types 
.......................

The size of fundamental data types such as char, int, long int, short int and 
float will depend mainly on the underlying architecture of the machine on which 
the C program is to run. Compiler writers usually implement these types in a 
way which is natural for the target. For example, Release 5 of the Microsoft C 
Compiler for DOS has int, short int and long int, occupying 2, 2 and 4 bytes 
respectively, while the ARM C Compiler uses 4, 2 and 4 bytes, respectively. 
Certain relationships are guaranteed by the ANSI C standard (such as 
sizeof(long int) >= sizeof(short int)), but code which makes any assumptions 
about whether int and long int have the same size, will not be portable.

A common non-portable assumption is embedded in the use of hexadecimal constant 
values. For example:

    int i = 0xffff;   /*    -1 if sizeof(int) == 2;
                            65535 if sizeof(int) == 4... */

In non-ANSI dialects of C there are pitfalls with argument passing. Consider, 
for example:

    int f(x)
    long int x;
    {...}

and the (careless) invocation of f():

    f(1); /*    f(1L) was intended/required */

If sizeof(int) == sizeof(long int), all will be well; otherwise there may be 
catastrophe.

A dual problem afflicts the format string of the printf() family, even in ANSI 
C. For example:

    long int l1, l2, l3;
    ...
    printf("L1 = %d, L2 = %d, L3 = %d\n", l1, l2, l3);
        /* "...%ld...%ld...%ld..." is intended/required */

Again, if sizeof(int) != sizeof(long) we have dangerous nonsense.

Another common assumption is about the signedness of characters, especially if 
chars are expected to be 7-bit quantities rather than 8-bit ones. For example, 
consider:

    static char tr_tab[256] = {...};
    ...
    int i, ch;
    ...
        i = fgetc(f);   /* should be i = (unsigned char) fgetc(f) */
        ch = tr_tab[i]; /* WRONG if chars are signed... */

Note that declaring <i >to be unsigned int doesn't help (it merely causes ch = 
tr_tab[i] to index a very long way off the other end of the array!).

In non-ANSI dialects of C there is no way to explicitly declare a signed char, 
so plain chars tend to be signed by default (as with the ARM C compiler in -pcc 
mode). In ANSI C, a char may be plain, signed or unsigned, so a plain char 
tends to be whatever is most natural for the target (<unsigned char> on the 
ARM).


Byte ordering
.............

A highly non-portable feature of many C programs is the implicit or explicit 
exploitation of byte ordering within a word of store. Such assumptions tend to 
arise when copying objects word by word (rather than byte by byte), when 
inputting and outputting binary values, and when extracting bytes from, or 
inserting bytes into, words using a mixture of shift-and-mask and byte 
addressing. A contrived example which illustrates the essential pitfalls is:

    unsigned a;
    char *p = (char *)&a;
    unsigned w = AN_ARBITRARY_VALUE;
    while (w != 0)          /* put w in a */
    {   *p++ = w;     /* or, maybe, w byte-reversed... */
        w >>= 8;
    }

This code will only work on a machine with 'little-endian' byte order.

The best solution to this class of problems is either to write code which does 
not rely on byte order, or to have separate code to deal appropriately with the 
different byte orders.


Store alignment
...............

The only guarantee given in the ANSI C Standard regarding the alignment of 
members of a struct is that a hole (caused by padding) cannot occur at the 
beginning of the struct.

The values of holes created by alignment restrictions are undefined, and you 
should not make assumptions about these values. Strictly, two structures with 
identical members, each having identical values, will only be found to be equal 
if field-by-field comparison is used; a byte-by-byte, or word-by-word, 
comparison need not indicate equality.

In practice, this can be a real problem for both auto structs and structs 
allocated dynamically using malloc. If byte-by-byte comparability of such 
structures is required, they must be zeroed using <memset()> before assigning 
field values.

Padding may also have implications for the space required by a large array of 
structs. For example:

    #define ARRSIZE 10000
    typedef struct
    {   int i;
        short s;
    } ELEM;
    ELEM arr[ARRSIZE];

may require 40KB, 60KB or 80KB depending on the size and alignment of ints and 
shorts (assume a short occupies 2 bytes, 2-byte aligned; then consider a 2-byte 
int, a 4-byte int 2-byte aligned, and a 4-byte int 4-byte aligned).


Pointers and Pointer Arithmetic
...............................

A deficiency of the original definition of C, and of its subsequent use, has 
been the relatively unrestrained conversion between pointers to different data 
types and integers or longs. Much existing code makes the assumption that a 
pointer can safely be held in either a long int or an int variable. While such 
an assumption may indeed be true in many implementations on many machines, it 
is a highly non-portable feature on which to rely. Furthermore, there is no 
single arithmetic type which is guaranteed to hold a pointer (long or unsigned 
long is probably a generally safer guess than int or unsigned int).

The problem is further compounded when taking the difference of two pointers by 
performing a subtraction. When the difference is large, this approach is full 
of potential errors. ANSI C defines a type <ptrdiff_t>, which is capable of 
reliably storing the result of subtracting two pointer values of the same type; 
a typical use of this mechanism would be to apply it to pointers into the same 
array.

Although the difference between any two pointers of similar type may be 
meaningful in a flat address space, only the difference between two pointers 
into the <same object> need be meaningful in a segmented address space.

Finally, there are problems of evaluation order with address arithmetic. 
Consider:

    long int base, offset;
    char *p1, *p2;
    ....
    offset = base + (p2 - p1);    /*intended effect */

Now suppose this latter expression were:

    offset = (base + p2) - p1;

In a flat address space without holes the expressions are equivalent. In a 
segmented address space, (p2 - p1) may well be a valid offset within a segment, 
whereas (base + p2) may be an invalid address. If, in the second case, the 
validity is checked before subtracting p1, then the expression will fault. This 
latter class of problem will be familiar to MS-DOS programmers, but alien to 
those whose main experience is of Unix.


Function-Argument Evaluation
............................

Whilst the evaluation of operands to operators as ',' and || is defined to be 
strictly left-to-right (including all side-effects), the same does not apply to 
function-argument evaluation. For example, in the function call:

    i = 3;
    f(i, i++);

it is unclear whether the call is <f(3, 3)> or <f(4, 3)>.

Of course, it is in general unwise for argument expressions to have side 
effects, for many reasons.


System-Specific Code
....................

The direct use of operating system calls is obviously non-portable, though 
often necessary. Isolating such code in target-specific modules, behind 
target-independent interfaces, helps.

File names and file-name processing are common sources of non-portability which 
are often surprisingly painful to deal with. Again, the best approach is to 
localise all such processing.

Binary data files are inherently non-portable. Often the only solution to this 
problem may be the use of some portable external representation.


ANSI C vs K&R C
---------------

The ANSI C Standard has tightened the definition of many of the vague areas of 
K&R C. This results in a much clearer definition of a correct C program. 
However, if programs have been written to exploit particular vague features of 
K&R C (perhaps accidentally), then their authors may be surprised when porting 
to an ANSI C environment. In the following sections, we present a list of what 
we consider to be the major language differences between ANSI and K&R C. We 
defer discussion of library differences until a later section. The order of 
presentation is approximately the order in which material is presented in the 
ANSI C standard.


Lexical elements 
.................

In ANSI C, the ordering of phases of translation is well defined. Of special 
note is the preprocessor which is conceptually token-based (which does not 
yield the same results as might naively be expected from pure text 
manipulation, because the boundaries between juxtaposed tokens are visible).

A number of new keywords have been introduced into ANSI C:

 *  The type qualifier <volatile> means that the qualified object may be 
    modified in ways unknown to the implementation, or that access to it may 
    have other unknown side effects. Examples of objects correctly described as 
    volatile include device registers, semaphores and data shared with 
    asynchronous signal handlers. In general, expressions involving volatile 
    objects cannot be optimised by the compiler.

 *  The type qualifier <const> indicates that an object's value will not be 
    changed by the executing program (and in some contexts permits a language 
    system to enforce this by allocating the object in read-only store).

 *  The type specifier <void> indicates a non-existent value for an expression.

 *  The type specifier <void *> describes a generic pointer to or from which 
    any pointer value can be assigned, without loss of information.

 *  The <signed> type specifier may be used wherever <unsigned> is valid (e.g. 
    to specify signed char explicitly).

 *  There is a new floating-point type: <long double>.

 *  The K&R C practice of using <long float> to denote <double> is outlawed in 
    ANSI C.

The following lexical changes have also been made:

 *  Each struct and union has its own distinct name space for member names.

 *  Suffixes U and L (or u and l), can be used to explicitly denote unsigned 
    and long constants (e.g. 32L, 64U, 1024UL etc.). The U suffix is new to 
    ANSI C.

 *  The use of octal constants 8 and 9 (previously defined to be octal 10 and 
    11 respectively) is no longer supported.

 *  Literal strings are considered read-only, and identical strings may be 
    stored as one shared value (as indeed they are, by default, by the ARM C 
    Compiler). For example, given:

          char *p1 = "hello";
          char *p2 = "hello";

    p1 and p2 will point at the same store location, where the string "hello" 
    is held. Programs must not, therefore, modify literal strings, (beware of 
    Unix's <tmpnam()> and similar functions, which do this).

 *  Variadic functions (those which take a variable number of actual arguments) 
    are declared explicitly using an ellipsis (...). For example:

          int printf(const char *fmt, ...);

 *  Empty comments /**/ are replaced by a single space, (use the preprocessor 
    directive ## to do token-pasting if you previously used /**/ to do this). 


Arithmetic
..........

ANSI C uses value-preserving rules for arithmetic conversions (whereas K&R C 
implementations tend to use unsigned-preserving rules). Thus, for example:

    int f(int x, unsigned char y)
    {
        return (x+y)/2;
    }

does signed division, where unsigned-preserving implementations would do 
unsigned division.

Aside from value-preserving rules, arithmetic conversions follow those of K&R 
C, with additional rules for long double and unsigned long int. It is now also 
allowable to perform float arithmetic without widening to double, (the ARM C 
system does not yet do this).

Floating-point values truncate towards zero when they are converted to integral 
types.

It is illegal to assign function pointers to data pointers and vice versa. An 
explicit cast must be used. The only exception to this is for the value 0, as 
in:

    int (*pfi)();
    pfi = 0;

Assignment compatibility between structs and unions is now stricter. For 
example, consider the following:

    struct {char a; int b;} v1;
    struct {char a; int b;} v2;
    v1 = v2;  /* illegal because v1 and v2 have different types */


Expressions
...........

Structs and unions may be passed by value as arguments to functions.

Given a pointer to a function declared as e.g. <int (*pfi)()>, the function to 
which it points can be called either by <pfi()>; or <(*pfi)()>.

Because of the use of distinct name spaces for struct and union members, 
absolute machine addresses must be explicitly cast before being used as struct 
or union pointers. For example:

    ((struct io_space *)0x00ff)->io_buf;


Declarations
............

Perhaps the greatest impact on C of the ANSI Standard has been the adoption of 
function prototypes. A function prototype declares the return type and argument 
types of a function. For example:

    int f(int, float);

declares a function returning int with one int and one float argument.

This means that a function's argument types are part of the type of the 
function, giving the advantage of stricter type-checking, especially between 
separately-compiled source files.

A function definition (which is also a prototype) is similar except that 
identifiers must be given for the arguments, for example, <int f(int i, float 
f)>. It is still possible to use old-style function declarations and 
definitions, but it is advisable to convert to the new style. It is also 
possible to mix old and new styles of function declaration. If the function 
declaration which is in scope is an old style one, normal integral promotions 
are performed for integral arguments, and floats are converted to double. If 
the function declaration which is in scope is a new-style one, arguments are 
converted as in normal assignment statements.

Empty declarations are now illegal.

Arrays cannot be defined to have zero or negative size.


Statements
..........

ANSI has defined the minimum attributes of control statements (e.g. the minimum 
number of case limbs which must be supported by a compiler, the minimum nesting 
of control constructs, etc.). These minimum values are not particularly 
generous and may prove troublesome if ultra portable code is required.

In general, the only limit imposed by the ARM C compiler is that of available 
memory. A future release may support an option to warn if any of the 
ANSI-guaranteed limits are violated.

A value returned from <main()> is guaranteed to be used as the program's exit 
code.

Values used in the controlling statement and labels of a switch can be of any 
integral type.


Preprocessor 
.............

Preprocessor directives cannot be redefined.

There is a new ## directive for token-pasting.

There is a <stringise> directive # which produces a string literal from its 
following characters. This is useful when you want to embed a macro argument in 
a string. 

The order of phases of translation is well defined and is as follows for the 
preprocessing phases:

 *  Map source file characters to the source character set (this includes 
    replacing trigraphs).

 *  Delete all newline characters which are immediately preceded by \.

 *  Divide the source file into preprocessing tokens and sequences of white 
    space characters (comments are replaced by a single space).

 *  Execute preprocessing directives and expand macros.

Any #include files are passed through steps 1-4 recursively.

The macro __STDC__ is predefined to 1 by ANSI-conforming compilers (and by the 
ARM C compiler).


PCC Compatibility Mode 
-----------------------

This section discusses the differences apparent when the compiler is used in 
PCC mode. When given the -pcc command line flag, the C compiler will accept 
(Berkeley) Unix-compatible C, as defined by the implementation of the Portable 
C Compiler and subject to the restrictions which are noted below.

In essence, PCC-style C is K&R C, as defined by B. Kernighan and D. Ritchie in 
their book <The C Programming Language>, together with a small number of 
extensions, and some clarifications of language features that the book leaves 
undefined.


Language and Preprocessor Compatibility 
........................................

In -pcc mode, the ARM C compiler accepts K&R C, but it does not accept many of 
pcc's old-style compatibility features, the use of which has been deprecated 
and warned against for years. The differences are:

 *  Compound assignment operators where the = sign comes first are accepted 
    (with a warning) by some PCCs. An example is =+ instead of +=. ARM C does 
    not allow this ordering of the characters in the token. 

 *  The = sign before a static initialiser was not required by some very old C 
    compilers. ARM C does not support this idiom.

 *  The following very peculiar usage is found in some Unix tools pre-dating 
    Unix Version 7:
    struct {int a, b;};
    double d;
    d.a = 0; d.b = 0x....; 

This is accepted by some Unix PCCs and may cause problems when porting old (and 
badly written) code:

 *  Enums are less strongly typed than is usual under PCCs. Enum is an 
    extension to K&R C which has been standardised by ANSI somewhat differently 
    from the BSD PCC implementation.

 *  Chars are signed by default in -pcc mode (unsigned in ANSI mode).

 *  In -pcc mode, the compiler permits the use of the ANSI ... notation which 
    signifies that a variable number of formal arguments follow.

 *  In order to cater for PCC-style use of variadic functions, a version of the 
    PCC header file varargs.h is supplied with the release.

With the exception of enums, the compiler's type checking is generally stricter 
than PCC, much more akin to lint's, in fact. In writing the ARM C compiler, we 
have attempted to strike a balance between giving too many warnings when 
compiling known, working code, and warning of poor or non portable programming 
practices. Many PCCs silently compile code which has no chance of executing in 
just a slightly different environment. We have tried to be helpful to those who 
need to port C among machines in which the following varies:

 *  the order of bytes within a word (e.g. little-endian ARM, VAX, Intel versus 
    big-endian Motorola, IBM370);

 *  the default size of int (four bytes versus two bytes in many PC 
    implementations);

 *  the default size of pointers (not always the same as int);

 *  whether values of type char default to signed or unsigned char;

 *  the default handling of undefined and implementation-defined aspects of the 
    C language.

The compiler's preprocessor is believed to be equivalent to a BSD Unix cpp 
except for the points listed below. Unfortunately, cpp is only defined by its 
implementation, and although equivalence has been tested over a large body of 
Unix source code, completely identical behaviour cannot be guaranteed. Some of 
the points listed below only apply when the -E option is used with the cc 
command:

 *  There is a different treatment of white space sequences (benign).

 *  Newline is processed by cc -E, but passed by cpp (making lines longer than 
    expected; (cc -E only).

 *  Cpp breaks long lines at a token boundary; cc -E does not. This may break 
    line-size constraints when the source is later consumed by another program 
    (cc -E only).

 *  The handling of unrecognised directives is different (this is mostly 
    benign).


Standard Headers and Libraries
..............................

Use of the compiler in -pcc mode precludes neither the use of the standard ANSI 
headers built in to the compiler nor the use of the run-time library supplied 
with the C compiler. Of course, the ANSI library does not contain the whole of 
the Unix C library, but it does contain many commonly used functions. However, 
look out for functions with different names, or a slightly different 
definition, or those in different standard places. Unless the user directs 
otherwise using -j, the C compiler will attempt to satisfy references to, say, 
<stdio.h> from its built-in filing system.

Listed below are a number of differences between the ANSI C Library, and the 
BSD Unix library. They are placed under headings corresponding to the ANSI 
header files:


ctype.h

There are no <isascii()> and <toascii()> functions, since ANSI C is not 
character-set specific.


errno.h

On BSD systems <sys_nerr> and <sys_errlist()> are defined to give error 
messages corresponding to error numbers. ANSI C does not have these, but 
provides similar functionality via <perror(const char *s)>, which displays the 
string pointed to by <s >followed by a system error message corresponding to 
the current value of <errno>.

There is also <char *strerror(int errnum)> which, when given a purported value 
of <errno>, returns its textual equivalent.


math.h

The #defined value HUGE, found in BSD libraries, is called HUGE_VAL in ANSI C. 
ANSI C does not have <asinh()>, <acosh()> or <atanh()>.


signal.h

In ANSI C the <signal()> function's prototype is:

    extern void (*signal(int, void(*func)(int)))(int);

<signal()> therefore expects its second argument to be a pointer to a function 
returning void with one int argument. In BSD-style programs it is common to use 
a function returning int as a signal handler. The PCC-style function 
definitions shown below will therefore produce a compiler warning about an 
implicit cast between different function pointers (since <f()> defaults to <int 
f()>). This is just a warning, and correct code will be generated anyway.

    f(signo)
    int signo;
    ...
    main()
    {   extern f();
        signal(SIGINT, f);
    ...


stdio.h

<sprintf()> returns the number of characters printed (following Unix System V), 
whereas the BSD's <sprintf()> returns a pointer to the start of the character 
buffer.

The BSD functions <ecvt()>, <fcvt()> and <gcvt()> are not included in ANSI C, 
since their functionality is provided by <sprintf()>


string.h

On BSD systems, string manipulation functions are found in <strings.h> whereas 
ANSI C places them in <string.h>. The ARM C Compiler also recognises <strings.h>
, for PCC-compatibility.

The BSD functions <index()> and <rindex()> are replaced by the ANSI functions 
<strchr()> and <strrchr()> respectively.

Functions which refer to string lengths (and other sizes) now use the ANSI type 
<size_t>, which in our implementation is <unsigned int>.


stdlib.h

<malloc()> has type void *, rather than the char * of the BSD <malloc()>.


float.h

A new header added by ANSI giving details of floating point precision etc.


limits.h

A new header added by ANSI to give maximum and minimum limit values for integer 
data types.


locale.h

A new header added by ANSI to provide local environment-specific features.


Machine-Specific Features 
--------------------------


Pragma Directives 
..................

Pragmas are recognised by the compiler in two forms:

    #pragma -<LetterOptional digit>
    #pragma [no]<feature-name>

A short-form pragma given without a digit resets that pragma to its default 
state; otherwise to the state specified.

For example:

    #pragma -s1
    #pragma nocheck_stack
    
    #pragma -p2
    #pragma profile_statements

The list of recognised pragmas is:

    Pragma Name                     Short Form  'no' Form
    warn_implicit_fn_decls          a1 *        a0
    check_memory_accesses           c1          c0 *
    warn_deprecated                 d1 *        d0
    continue_after_hash_error       e1          e0 *
    FP register variable            f1-f4       f0 *
    include_only_once               i1          i0 *
    optimise_crossjump              j1 *        j0
    optimise_multiple_loads         m1 *        m0
    profile                         p1          p0 *
    profile_statements              p2          p0 *
    integer register variable       r1-r7       r0 *
    check_stack                     s0 *        s1
    force_top_level                 t1          t0 *
    check_printf_formats            v1          v0 *
    check_scanf_formats             v2          v0 *
    side_effects                    y0 *        y1
    optimise_cse                    z1 *        z0

In each case, the default setting is starred.


Specifying Pragmas from the Command Line 
.........................................

Any pragma can be specified from the compiler's command line using:

    -zp<LetterDigit>

Certain of the pragmas give more local control over what can be controlled per 
compilation unit, from the command line. For example: 

    Pragma Name                     Command Line Form

    nowarn_implicit_fn_decls        -Wf

    nowarn_deprecated               -Wd

    profile                         -p

    profile_statements              -px


Pragmas Controlling the Preprocessor
....................................

The pragma <continue_after_hash_error> in effect implements a #warning "..." 
preprocessor directive. Pragma <include_only_once> asserts that the containing 
#include file is to be included only once, and that if its name recurs in a 
subsequent #include directive then the directive is to be ignored.

Pragma <force_top_level> asserts that the containing #include file should only 
be included at the top level of a file. A syntax error will result if the file 
is included, say, within the body of a function.


Pragmas Controlling printf/scanf Argument Checking 
...................................................

Pragmas <check_printf_formats> and <check_scanf_formats> control whether the 
actual arguments to printf and scanf, respectively, are type-checked against 
the format designators in a literal format string. Of course, calls using 
non-literal format strings cannot be checked. By default, all calls involving 
literal format strings are checked.


Pragmas Controlling Optimisation 
.................................

Pragmas <optimise_crossjump>, <optimise_multiple_loads> and <optimise_cse> give 
fine control over where these optimisations are applied. For example, it is 
sometimes advantageous to disable cross-jumping (the 'common tail' 
optimisation) in the critical loop of an interpreter; and it may be helpful in 
a timing loop to disable common subexpression elimination and the opportunistic 
optimisation of multiple load instructions to load multiples. Note that correct 
use of the <volatile> qualifier should remove most of the more obvious needs 
for this degree of control (and <volatile> is also available in the ARM C 
compiler's -pcc mode unless -strict is specified).

By default, functions are assumed to be impure, so function invocations are not 
candidates for common subexpression elimination. Pragma <noside_effects> 
asserts that the following function declarations (until the next #pragma 
<side_effects>) describe pure functions, invocations of which can be CSEs.  See 
also "<__pure>".


Pragmas Controlling Code Generation
...................................

 *  Stack-limit checking.

 *  Memory access checking.

 *  Global (program-wide) register variables.

If the compiler is configured to compile code for the explicit stack limit 
variant of the ARM Procedure Call Standard (documented in "<ARM Procedure Call 
Standard>" of the Technical Specifications), then #pragma 
<nocheck_stack> disables the generation of code at function entry which checks 
for stack limit violation. In reality there is little advantage to turning off 
this check: it typically costs only two instructions and two machine cycles per 
function call. The one circumstance in which <nocheck_stack> must be used is in 
writing a signal handler for the SIGSTAK event. When this occurs, stack 
overflow has already been detected, so checking for it again in the handler 
would result in a fatal circular recursion.

The pragma <check_memory_accesses> instructs the compiler to precede each 
access to memory by a call to the appropriate one of:

    __rt_rd?chk   (?=1,2,4 for byte, short, long reads, respectively) 
    __rt_wr?chk   (?=1,2,4 for byte, short, long writes, respectively)

It is up to your library implementation to check that the address given is 
reasonable.

The pragmas f0-f4 and r0-r7 have no long form counterparts. Each introduces or 
terminates a list of <extern>, file-scope variable declarations. Each such 
declaration declares a name for the <same> register variable. For example:

    #pragma r1              /* 1st global register */
    extern int *sp;
    #pragma r2              /* 2nd global register */
    extern int *fp, *ap;    /* synonyms */
    #pragma r0              /* end of global declaration */
    #pragma f1
    extern double pi;       /* 1st global FP register */
    #pragma f0

Any type that can be allocated to a register (see "<Registers>" starting on 
page72), can be allocated to a global register. Similarly, any floating point 
type can be allocated to a floating point register variable.

Global register r1 is the same as register v1 in the ARM Procedure Call 
Standard (APCS); similarly r2 equates to v2, etc. Depending on the APCS 
variant, between 5 and 7 integer registers (v1-v7, machine registers R4-R10) 
and 4 floating point registers (F4-F7) are available as register variables. In 
practice it is probably unwise to use more than 3 global integer register 
variables and 2 global floating-point register variables.

Provided the same declarations are made in each separate compilation unit, a 
global register variable may exist program-wide.

Otherwise, because a global register variable maps to a callee-saved register, 
its value will be saved and restored across a call to a function in a 
compilation unit which does not use it as a global register variable, such as a 
library function.

A corollary of the safety of direct calls out of a global-register-using 
compilation unit, is that calls back into it are dangerous. In particular, a 
global-register-using function called from a compilation unit which uses that 
register as a compiler-allocated register, will probably read the wrong values 
from its supposed global register variables.

Currently, there is no link-time check that direct calls are sensible. And even 
if there were, indirect calls via function arguments pose a hazard which is 
harder to detect. This facility must be used with care.  Preferably, the 
declaration of the global register variable should be made in each compilation 
unit of the program.  See also "<__global_reg(n)>".


Special Function Declaration Keywords
.....................................

Several special function declaration options are available to tell <armcc> to 
treat that function in a special way.  None of these are portable to other C 
compilers.


__value_in_regs

This allows the compiler to return a structure in registers rather than 
returning a pointer to the structure. eg.

    typedef struct int64_structt {
      unsigned int lo;
      unsigned int hi;
    } int64;
    
    __value_in_regs extern int64 mul64(unsigned a, unsigned b);

See "<ARM Procedure Call Standard>" of the Technical 
Specifications for details of the default way in which structures are passed 
and returned.


__swi and __swi_indirect

A SWI taking up to four arguments (in registers 0 to argcount-1) and returning 
up to four results (in registers 0 to resultcount-1) can be described by a C 
function declaration, which causes uses of the function to be compiled inline 
as a SWI SWI. For a SWI returning 0 results use:

    void __swi(swi_number) swi_name(int arg1, ..., int argn);

for example

    void __swi(42) terminate_process(int arg1, ..., int argn);

For a swi returning 1 result, use:

    int __swi(swi_number) swi_name(int arg1, ..., int argn);

For a swi returning more than 1 result

    struct { int res1, ... resn }
      __value_in_regs
        __swi(swi_number) swi_name(int arg1, ... int argn);

Note that __value_in_regs is needed to specify that a (short) structure value 
is returned in registers, rather than by the usual indirection mechanism 
specified in the ARM Procedure Call Standard.

If there is an indirect SWI (taking the number of a SWI to call as an argument 
in r12), calls through this SWI can similarly be described by a C function 
declaration such as:

    int __swi_indirect(swi_indirect_number)

        swi_name(int real_swi_number, int arg1, ... argn);

For example,

    int __swi_indirect(0) ioctl(int swino, int fn, void *argp);

This might be called as:

    ioctl(IOCTL+4, RESET, NULL);


__irq

This allows a C function to be used as an interrupt routine.  All registers 
(excluding floating point registers) are preserved (not just those normally 
presefved under the APCS).  Also the function is exited by setting the pc to 
lr-4 and the psr is set to its original value.


__pure

By default, functions are assumed to be <impure> (ie they have side effects), 
so function invocations are not candidates for common subexpression 
elimination.  __pure has the same effect as pragma <noside_effects>, and 
asserts that the function declared is a <pure> function, invocations of which 
can be CSEs.


Special Variable Declaration Keywords
.....................................


__global_reg(n)

Allocates the declared variable to a global integer register variable, in the 
same way as #pragma r<n. > The variable must have an integral or pointer type.  
See also "<Pragmas Controlling Code Generation>".


__global_freg(n)

Allocates the declared variable to a global floating point regioster variable, 
in the same way as #pragma f<n>.  The variable must have type float or double.  
See also "<Pragmas Controlling Code Generation>".

Note that the global register, whether specified by keyword or pragmas, musr be 
specified in all declarations of the same variable.  Thus

    int x;
    __global_reg(1) x;

is an error.


Floating Point Support 
-----------------------

The ARM C Compiler generates ARM Floating Point instructions to perform 
floating point operations.

The ARM's floating point instruction set is supported either by an attached 
floating-point coprocessor (hardware coprocessors 1 and 2) or by an instruction 
emulator entered from the undefined instruction trap.

Normally the floating point instruction emulator is installed by the 
environment in which the program is executing.  However, for a completely 
standalone application the program can install the floating point emulator 
itself.  This is described in the following paragraphs.

The ARM Floating Point instruction set Emulator (FPE) is supplied with the ARM 
C system as a linkable object file. Its environmental dependencies are all via 
a <stub>, supplied as an assembly language source. This stub file, <fpestub>, 
documents how to attach an FPE to the invalid instruction trap location 
(address 0x4).

It is intended that the FPE and the <fpestub> be linked together with whatever 
else is required to make a standalone module on the target hardware. The 
<fpestub> contains two entries for initialisation (attachment to the invalid 
instruction trap vector) and finalisation (removal from the invalid instruction 
trap vector). These should be called on activation and deactivation, 
respectively, of the standalone module.

For testing purposes, the FPE, <fpestub> and a test application can be linked 
together to make a single, standalone application. The application must call 
__fp_initialise before using any floating point instructions, and __fp_finalise 
before exiting.

