Showing posts with label C++. Show all posts
Showing posts with label C++. Show all posts

Tuesday, August 14, 2007

L/R Values in C

Every expression in C and C++ is either an l-value or an r-value. An l-value is an expression that designates (refers to) an object. Every l-value is, in turn, either modifiable or non-modifiable. An r-value is any expression that isn't an l-value. Operationally, the difference among these kinds of expressions is this:
The address of each l-value is known at compile time and is where the variable will be kept at runtime, i.e., a modifiable lvalue is addressable (can be the operand of unary &) and assignable (can be the left operand of =). So if the compiler needs to do something with an address (add an offset to it, perhaps), it can do that directly and does not need to plant code to retrieve the address first. In contrast, an r-value is neither addressable nor assignable. For example, the followings are typical r-values and could not be modified.
- function return value
- the result of ? :

A non-modifiable l-value is addressable, but not assignable, like pointer to const value or the array name.

Endianness

* Memory Allocation Rule
In general, the memory is allocated for data starting from low address in the stack(grow from high address to low address) and the heap.

* Endianness
Most modern computer processors agree on bit ordering "inside" individual bytes (this was not always the case). This means that any single-byte value will be read the same on almost any computer one may send it to.

Integers are usually stored as sequences of bytes, so that the encoded value can be obtained by simple concatenation. The two most common of them are:
- increasing numeric significance with increasing memory addresses, known as little-endian, and
- its opposite, called big-endian.
The bit order of register in CPU is fixed for both endians and hardware would take charge of conversion between register and memory.

* Endianness in Networking
Data is transfered generally byte by byte from low address (char *buffer). Networks generally use big-endian order for BYTE transmission (maybe not in memory), and thus it is called network order when sending information over a network in a common format. Some routines are provided generally for these conversions between network and host.

* Bit-Level Endianness
Bit endianness is used to refer to the transmission order of bits over a serial medium. Most often that order is transparently managed by the hardware and is the bit-level analogue of little-endian (low-bit first), although protocols exist which require the opposite ordering (e.g. I2C). In networking, the decision about the order of transmission of bits is made in the very bottom of the data link layer of the OSI model.

* Bit-Shift and Endianness
Bit-shift has nothing with endianness. The results of these operations are defined by std and implemented by compiler.
Reference: Endianness in Wikipedia

Optimization in C

* Better Algorithm or Data Structure

* Mathematics Solution
- Get correct mathematics formula for the problem. For example, calculate N!.

* More Space Less Time
- Using macros for short functions
- Look-up Table

* Use Bit Operations (Be cautious)
- Shift and bit mask for division and module

* Language and compiler features
- Use pointer to operate an array
- No need to define useless return value
- Define a variable in the register instead of stack by using `register'
- Prefer post ++/-- to prefix ++/-- for usage without reference to the result
- Organize the order of cases in switch: The higher occurrence the higher its case number
- In some cases, an array of pointer to functions might be more efficient than switch statement. For example,
int handleMsg1(void);
int handleMsg2(void);
int handleMsg3(void);
int (*MsgFunction[])() = {handleMsg1, handleMsg2,handleMsg3};
status = MsgFunction[ReceiveMessage()]();

* Hardware features
- Copy codes and data from FLASH to RAM for running
- Fully use the UART buffer for data transfer
- Use DMA

* Embed Assembly
- However, it is non-portable

Std Library Functions

* Library Calls And System Calls
- Library calls are parts of the language or application while system calls are part of OS. A system call would be triggered by using trap or interrupt. The C library is the same on every ANSI C implementation. They are calls to routines in a library and linked with user program. They executes in the user address space with lower calling overhead and counts as part of user time. On the other hand, the system calls are different in each OS. They are calls to the kernal for a service. They are entry points to the OS. They executes in the kernal address space with high calling overhead and counts as part of system time.

In practice, many C library functions do their jobs by making system calls.

* getchar
- Its return value is integer instead of char.

* strcpy, strcat, strncpy and strncat
- strcpy and strcat do not check the buffer size and the ending condition is '\0'. So it might cause buffer boundary problem. Try to use strncpy and strncat.

* strlen
- Its return value does not include the character '\0'

* scanf
- It expects a pointer to an integer instead of char to read an integer. The formats of float and double are different. %f for the float and %lf for the double. In printf, %f for both.

* memcpy and memmove
- memcpy could not copy overlapped memory blocks of src and dst and memmove could at a cost of performance.

* setbuf
- When the main function returns, the library would FREE and clean up the memory setbuf used. Therefore, this buffer could not be one in the stack and it is supposed to be memory in the heap or static/global array.

* fread, fseek and fwrite
- fseek needs to be called between the callings of fread and fwrite. One file can not be read and write in sequence without the state change.

* errno
- This global variable would always be the last error number. The general usage should be:
if (error return value)
{
     check errno
}

Monday, August 13, 2007

Thumb of Rules

Function Interface Design
* Write function comment in the head of each function
* Give a good name to each function and avoid using undefined verbs
* Return a status and pass the desired return values by pointers
* The number of input paramters is no more than 7. Othrewise use struct
* Write function prototype and keep data types of input and return parameters match their declarations and no casting
* Const qualified read-only parameter to which input pointers point
* Follow some order of input parameters, like (dst, src, num)
* Check the legitimacy of input parameters and input global variables by using `assert'
* Avoid using input parameters, esp. pointers, as working variables. Use their local copies
* Check the legitimacy of function return value, esp. the return value of std library functions
* Use macros to replace variables of multiple references (i.e., of very long names)

Function Design
* Divide large scale codes to multiple level function calls.
* Design functions with high input fans and low output fans
* Write re-entrant function as much as possible
* Define only one logic task for each function as much as possible
* Keep the length of each function within 200~300 lines
* Separate implemention codes from control codes

Physical Structure of Code
* The basic code unit is a set of files, c/cpp file and its associated h files which might be more than one, like one for the public and another one for the private. The c/cpp file in one unit is supposed to include its h files.
* The h files should include non-std h files as few as possible. These non-std h files should be included in the c/cpp file. The h files could do forward declarations for type declarations within itself in its beginning to avoid including other h files.
* Each h file should be guarded with #ifndef/#define and its unique tag.
* Avoid using `extern' functions directly in c/cpp file but try to include the h files of their declarations.
* Use `static' to restrict the scope of functions and variables. Global variables should be `extern' declared in its h file and defined & initialized in the c/cpp files. Include the h file of global variables first before use them.

Misc
* Not do too much in ONE single statement.
* malloc and free
- Check the return pointer of malloc, reset the memory allocated to zeros with memset.
- Free the memory allocated by malloc at the same code level, assign NULL to the pointer finally.
* fopen and fclose
- Check the return handle of fopen
- Close the file handle in the end
* sizeof and size_t
- size_t == unsigned long int
- sizeof data type instead of variable name
- Prefer (num * sizeof data type) to (sizeof data type * num)
* Condition Check with Boolean, Int, Float and Pointer
- Boolean: if (!Flag)
- Int: if (0 == Flag)
- Float: if (Flag >= -EPSILON && Flag <= EPSILON)
- Pointer: if (NULL == p)
* Infinite Loop - while (1) {...}

Re-entrant Code

* Re-entrant Code
Re-entrant code could be used by multi-thread and the result is only dependent upon input parameters. Re-entrant code is thread safe but thread safe might not re-entrant since the state of code might change due to global states. Re-entrant functions are similar to those of functional programming.

* Non Re-entrant Code
- Uses non-const global variables
- Uses static global/local variables
- Calls non re-entrant functions, like malloc, free, printf, fopen, and other I/O std functions

Interrupt Service Routine (ISR)

* ISR Issues
Find out what is wrong in the following ISR:
__interrupt double compute_area(double radius)
{
     double area = PI * radius * radius;
     printf("\n area = %f\n", area);
     return area;
}
ISR is supposed to be
- No input parameters
- No returns
- Be compact and simple. Note: float point arithmetic operations and printf are complex and not re-entrant.

Const and Volatile Qualifier

* Const
- Read only
- Initialization
const type must be initialized when it is declared.
     const int j; /* error */
- Specify the exact data type
     #define i 10
     const long j = 10;
     char h = i;
     char k = j; /* error due to truncation */
- Save memory
Memory is allocated once.
     #define STRING "abcdefg"
     const char string[] = "abcdefg";
     printf(STRING); /* first time allocation */
     printf(string);
     printf(STRING); /* second time allocation */
     printf(string);
- Change const
     const int i = 0;
     int *p = (int *)&i;
     *p = 10;

* Volatile
- Usage
-- Hardware registers or ports which might be changed by I/O
-- Non-automative variables in ISR which might be changed by ISR
-- Global variables in multi-threading environment which might be changed by other thread.
- Effect
Compiler would re-read this variable from the cache or memory instead of register.

* Qualifier Meaning
- int const/volatile *p = &i;
- int * const/volatile p = &i;
- int volatile * const p = &i;

* Declaration of Const and Volatile
- Two types of declaration
     const/volatile int i; <=> int const/volatile i;
The latter one might be better. Think about this one:
     typedef char * pchar;
     const pchar p;
It might be explained as "const char *" but actually it is "char * const". The declaration of "pchar const p" has no such confusion.
- Declaring an entire object to be volatile and/or const effectively
declares each member of that object as volatile and/or const.
- Defining a data type to be const and/or volatile might be more useful than just defining an object of this type to be const and/or volatile. The reason of this is no need to consider the type match (as explained below) when pointer assignments happen, like parameter passing in function calling. Otherwise, all these occurances should be declared as const and/or volatile.
Note: typedef struct A const AA; =>
AA is const but struct A is not. So use this format:
typedef struct A
{
...
} const B;

* Const/Volatile Pointer Assignment
     int *p = &i;
     int const *cp; /* No initialization is allowed since cp is a pointer not const pointer actually */
     p = cp; /* error */
     cp = p;
In the above example, cp is a pointer to a qualified data type while p is a pointer to a unqualified data type. In the assignment, the data type to which the left pointer points should be with the qualifier of the right one. So cp could not be assigned to p.
     int **pp = &p;
     int const **cpp;
     cpp = pp; /* error */
which might happen in the parameter passing of function calling. Why?
pp => a pointer to a pointer to int
cpp => a pointer to a pointer to const int
"A pointer to int" is not the same data type as "a pointer to const int". Therefore, pp and cpp is different pointers. Casting needed here. If
pp => a pointer to a pointer to int (int **)
cpp => a pointer to a const pointer to int (int * const *) or
           a const pointer to a const pointer to int (int * const * const) or
           a const pointer to a pointer to int (int ** const)
then cpp = pp; is legal.

Sunday, August 12, 2007

Casting in C

* The cast operator forces the conversion of its SCALAR operand to a specified SCALAR data type, or to void . The operator consists of a type-name, in parentheses, that precedes an expression, as follows:
( type-name ) expression

The type-name can also be an enum specifier, or a typedef name. The type-name can be a structure or union only if it is a pointer. That is, the type-name can be a pointer to a structure or union, but cannot be a structure or union because structures and unions are not scalar types. For example:
     (struct abc *)x /* allowed */
     (struct abc)x /* not allowed */

Cast operations cannot force the conversion of any expression to an array, function, structure, or union. The following example casts the identifier P1 to pointer to array of int:
     (int (*)[10]) p;
This kind of cast operation does not change the contents of P1 ; it only causes the compiler to treat the value of p as a pointer to such an array.
     p + 1; /* Increments by 10*sizeof(int) */

* Cast operators can be used in the following conversions that involve pointers:
- A pointer can be converted to an integral type. A pointer occupies the same amount of storage as objects of type int or long (or their unsigned equivalents). Therefore, a pointer can be converted to any of these integer types and back again without changing its value. No scaling takes place, and the representation of the value does not change. Converting from a pointer to a shorter integer type is similar to converting from an unsigned long type to a shorter integer type; that is, the high-order bits of the pointer are discarded. Converting from a shorter integer type to a pointer is similar to the conversion from a shorter integer type to an object of unsigned long type; that is, the high-order bits of the pointer are filled with copies of the sign bit.
- A pointer to an object or incomplete type can be converted to a pointer to a different object or a different incomplete type. The resulting pointer might not be valid if it is improperly aligned for the type pointed to. For example,
     char c[10];
     int *p = (int *)c[1]; /* misalignment */
It is guaranteed, however, that a pointer to an object of a given alignment can be converted to a pointer to an object of the same alignment or less strict alignment, and back again. The result is equal to the original pointer. (An object of character type has the least strict alignment.) For example,
struct A
{
     struct B;
     int C;
} a, *pa, *pa2;
pa = &a;
struct B *pb;
pb = (struct B *)pa;      /* Allowed and safe. Recall the alignment property of struct */
pa2 = (struct A *)pb;     /* Now pa == pa2 */
- A pointer to a function of one type can be converted to a pointer to a function of another type and back again; the result is equal to the original pointer. If a converted pointer is used to call a function that has a type not compatible with the type of the called function, the behavior is undefined.

Reference: EETime Embedded

Saturday, August 11, 2007

void and void *

* void
- NO INSTANCE of void data type, like "void a". The data type of void is abstract, like the abtract class in C++.
- void Usage: function return is void and function arguments are void

* "void *"
- The pointer of void * could be assigned by any type of pointer. No casting needed. However, it is not correct to assign the pointer of void * to the pointer of other data types. Casting needed.
- The pointer of void * could not perform arithmetic operations, like
void *p;
p++;    /* No increment is allowed on p */
- The pointer of void * could be used as an argument or return value of functions which accepts or return a pointer to any data type, respectively. For example,
void *memcpy(void *dst, void *src, size_t len);

Struct and Union

* Typical Usage of Struct and Union
struct A
{
     int a;
     char b;
};
struct B
{
     short c;
     char d[2];
};
struct C
{
     int e;
     long f;
     short g;
};

struct CommonPacket
{
     int PacketType;
     union
    {
         struct A PacketA;
         struct B PacketB;
         struct C PacketC;
    }
}

In general, struct could be used effectively to describe a section of continual memory slots or registers; and access it with the pointer to the head of this section.

* Alignment in Struct (How to estimate the size of struct?)
- "Although you can never be absolutely sure how your compiler will pad the members within a structure, the Standard guarantees there will be no padding before the first member. The Standard also mandates that each member in a structure must be allocated in the order in which it's declared."

- Natural Alignment:
By default, each data member is aligned based on the size of its data type. The padding after this member is the one which makes the next data member aligned on its boundary. In the end, the size of struct should be multiple of the maximal size among data member in struct. This maximal size is just the alignment size for this struct. Fox example,
struct A
{
     char a;
     long b;
};
struct B
{
     short c;
     struct A d;
}
Then for struct A, a would be aligned with the size of 1 byte (sizeof(char)). Since the alignment size of b is 4 bytes (sizeof(long int)), three bytes need to be padded after a in order to guarantee b aligned in one address of multiple of 4. The final size of struct A is 8 which is multiple of 4. Therefore the alignment of struct A is 4. (Think about what if the positions of a and b switch in struct A.) About struct B, it contains one compound of struct A. So first the alignment of this compound should be considered. It is 4 as explained before. Two bytes are padded after c and the final size of struct B is 12 which satisfies the requirement.
Note: 1), The final size of struct is not equal to (N x Len), where N is the number of data members and Len is the maximal size of data members. The objective is to save memory allocation as much as possible. Look at this example:
struct C
{
char x1;
short x2;
int x3;
char x4;
};
The size of struct C is not 16.
2), For arrays, the alignment size is the size of data type but not the size of the array. For example,
long int c[20];
The alignment size for c is sizeof(long) but not sizeof(c).
- Alignment with #pragma
Force structs to align n bytes: #pragma pack(n)
Cancel alignment of n bytes: #pragma pack()
The alignment size of each data member should be the minimal value between its natural alignment size and n.
In summary, 1) defining the alignment size for each data member (compounds first), 2) align data members in sequence, 3) save memory as much as possible.
- Offset Calculation
(size_t)((char *)&((struct A *)0)->f - (char *)((struct A *)0))

* Initialization of struct
- struct A a = {'t', 'c', 8, 0.99, "example"}; or
struct A a = {0}; /* Every member is 0 now, no matter which type.*/

* Assignment of struct
struct A
{
     char *p;
     char c;
} a, b;
char cc = 'c';
a.p = &cc;
a.c = 15;
b = a;
*b.p = 30;      /* cc now is changed */
If the pointer is contained in struct, when assignment happens between two variables, two pointers are point to the same memory.
Although arrays could not be assigned to each other, they could if they are within one struct, like this:
struct A
{
     char array[10];
} a, b;
for (int i = 0; i < 10; ++i)
     a.array[i] = i;
b = a;     /* b.array now is the same with a.array */

* Struct For Bit Map
Under some circumstances, struct could be used to do bit map for a block of memory, like this
struct A
{
     int a:1;
     int b:7;
} t;
t.a = 1;
t.b = 0x7f;
- The total number of bits should be reasonable. It might be the size of one of basic data types, like char, short, int, long int, etc.
- Be cautious that t.a and t.b are defined as SIGNED int. Therefore, one bit needs to be the sign and the value ranges of them are [0,-1] and [-64, 63]. Unsigned data type might be more useful for this kind of struct usage since each bit in this struct should be meaningful.
- Almost everything of bit field in struct is implementation-dependent. Make sure everything, like which end starts in bit order, whether it allows cross the boundary of byte, etc. before use this data structure.

* Union in Memory
In general, all data members of one union start at the same low memory address. This property would be used to exploit some special usages of union.
union bits32
{
     char bytes[4];
     int whole;
} t;
t.whole = 0x12345678;
t.bytes[0] = 0x90; => Now t.whole becomes 0x12345690 in little endian system.
Another classic example of union to check system endian:
t.whole = 1;
return (t.bytes[0] == 1); /* True is little endian and false is big endian */
Or:
union bits32 endian_test = { { 'l', '?', '?', 'b' } };
#define ENDIANNESS ((char)endian_test.whole)

Unsigned and Signed Integer

* Two attributes for char, short, int, long and long long:
bitwidth (8, 16, 32, 64 bits) and sign (unsigned, signed).

* Conversion Rule
- When an expression does operations with the same bitwidth on (signed/unsigned)char, (signed/unsigned)short, bit-field, enum, these types would be promoted to int type. And float type would be promoted to double type. This is called type promotion.
- When an expression contains variables or numbers whose bitwidths are different, all variables or numbers would be converted to the wider data type (signed or unsigned) and continue the operation. This is called universal arithmetic conversions. The conversion rule is to extend the sign to the bytes of high addresses since in general the allocation of memory is from low address to high address in stack and heap. On the other hand, the opposite conversion from wider bitwidth to narrower bitwidth, the bytes with high addresses, which contains the sign, would be discarded. Keep in mind different results due to the big and little endian of the system.

* Arithmetic Operation of Unsigned
- When an expression contains variables or numbers that are with the SAME bitwidth but different sign, the signed data is converted to the unsigned version. This might bring some trouble when it happens in the condition check, like this:
unsigned int a = 6;
int b = -20;
int c = (a+b>6) ? a : b;
The c would be always equal to a.
- The general arithmetic operations on unsigned:
c = a +/- b mod 2^n
where n is the bitwidth of the data type. Therefore no overflow and underflow for unsigned data. This might be not expected in some cases.

If both operands are signed, the result of overflow/underflow is UNDEFINED. In general, it is hard to test overflow/underflow of SIGNED integer operations. It could be done to check the flags of some status register in Assembly. However if x and y are two integers and known to be non-negative, it could be done in this way:
if ((int)((unsigned)x + (unsigned)y) < 0)
    complain();

* Shift Operations
- If the item is left shifted, zeros are padded in the right. Not left shift signed data.
- "If the item being right shifted is unsigned, zeroes are shifted in. If the item is signed, the implementation is permitted to fill vacated bit positions either with zeroes or with copies of the sign bit. If you care about vacated bits in a right shift, declare the variable in question as unsigned. You are then entitled to assume that vacated bits will be set to zero."
- "if the item being right or left shifted is n bits long, then the shift count must be greater than or equal to zero and strictly less than n. Thus, it is not possible to shift all the bits out of a value in a single operation."
- By shifting bits, the multiplication and division for unsigned and multiplication for signed are safe and correct. "Note that a right shift of a signed integer is generally not equivalent to division by a power of two, even if the implementation copies the sign into vacated bits. To prove this, consider that the value of (-1)>>1 cannot possibly be zero."

* size_t
- size_t = unsigned long int
- The return data type of sizeof is size_t. Keep in mind the rules of conversion and unsigned arithmetic operations. For example,
#define TOTAL (sizeof(array)/sizeof(array[0]))
{
     int d = -1;
     if (d <= TOTAL-2)
         x = array[d+1];
}

* Post-fix UL and L
If an expression has the overflowed value, consider to put these post-fix on integer numbers: U, L, and UL. Fox example, write a routine to calculate n! assuming the result would not make long int overflow.
long foo(int n)
{
     return ((n+1L) * n / 2);
}

* Usage of Unsigned Data in C
- Unsigned version is ONLY used in BIT OPERATIONS (&, |, ~, >>, <<). Otherwise, CAST it to signed version.

* Identify Whether a Data Type or Variable Is Unsigned
- For variables: #define ISUNSIGNED(a) ((a) >= (char)0 && ~(a) >= (char)0)
- For data type: #define ISUNSIGNED(type) ((type)0 - (char)1 > (char)0)

Saturday, July 21, 2007

Functions

* Memory Footprint of a Process
- "To understand what stack buffers are we must first understand how a process is organized in memory. Processes are divided into three regions:
Text, Data, and Stack.

The text region is fixed by the program and includes code (instructions) and read-only data. This region corresponds to the text section of the executable file. This region is normally marked read-only and any attempt to write to it will result in a segmentation violation.

The data region contains initialized and uninitialized data. Static variables are stored in this region. The data region corresponds to the data-bss sections of the executable file. Its size can be changed with the brk system call. If the expansion of the bss data or the user stack exhausts available memory, the process is blocked and is rescheduled to run again with a larger memory space. New memory is added between the data and stack segments."

Therefore, the memory footprint of one process is like this: (from high address to low address): Stack (grow from high to low), uninitialized data region, initialized data region and text region.

* Stack and Frame
For single threading programs, all of the subroutines would share the same stack which is allocated when it is executed. In general three addresses are needed for proper stack operations, and they are stored in processor registers. They are stack pointer(SP), stack base and stack limit.

One more register, namely frame pointer(FP), is needed for the normal function calling. When procedure is called, after input parameters (the pushing order of input parameters is from right to left) and return address (it is the current PC and then *PC = addr of procedure) are pushed, the frame pointer is pushed into the stack, and FP = SP (here SP points to the last occupied address instead of the first free location of the stack). Then the local variables would be allocated. All of these elements consist of a stack frame.

Frame pointer is used to support variable length of the stack. Once returned, the function would assign SP = FP, which points to the previous FP. Pop it up to assign FP. Thus the context goes back to the last one. Pop up one more to get return PC and continue the process execution. Meanwhile, during the execution of the callee function, FP would be the base address to access the local variables and input parameters, i.e., the offsets of local variables are negative and offsets of input parameters are positive. For example,
void function(int a, int b, int c)
{
     char buffer1[5];
     char buffer2[10];
}
void main()
{
     function(1,2,3);
}
The assembly of codes by using "gcc -S -o example1.s example1.c" is
     pushl $3
     pushl $2
     pushl $1
     call function
This pushes the 3 arguments to function backwards into the stack, and calls function(). The instruction 'call' will push PC onto the stack.The first thing done in function is the procedure prolog:
     pushl %ebp /* EBP is FP */
     movl %esp,%ebp /* ESP is SP */
     subl $20,%esp /* allocate buffer1 */
The stack now looks like:
     buffer1 FP RET a b c (Bottom of stack)
    [      ] [] [] [][][]
Note: the head of buffer1 locates in the low address (left side). So once the overflow happens on buffer1, FP, RET would be overwritten and segmentation fault occurs.

* Function Name
The function name is the address of the function codes in text region. It could directly be assigned to one pointer to function. Meanwhile, it is allowable to specify any address in text region to a pointer to function (no function body at all) and execute it. This is typically used to reset system.

* Function Parameters Casting
"It is the programmer's responsibility to ensure that the argument to a function are of the right type."

* Function Parameters
The function parameter is always passed by value into function stack, never by reference in C. That means the copies of function parameter, instead of themselves, are used within the function. Therefore any changes on copies can not be seen outside the function. The only way to achieve such changes on original variables is to use pointers as parameters and dereference them inside. Keep in mind the key here is to change the variables pointed by pointers. The pointers themselves within the function are still the copies of original ones. Changes on pointers within functions are not meaningful. For example,
void GetMem(char *p, int num)
{
        p = (char *)malloc(num * sizeof(char));
}
void FreeMem(char *p)
{
        free(p);
        p = NULL;
}
Two solutions for this: to use **p or to return the local pointer.

* Local Variables
Local variables are allocated in function stack and they would freed automatically when function returns. Generally local variables, like pointers (to non-local memory, like static variable), struct, union, etc., could be returned by value, not reference. The exception is arrays. For one thing, arrays could not be treated as a whole unit in C and therefore could not be returned. On the other hand, the memory of arrays is freed once the function returns. No way to dereference this data outside the function.

One typical example for this case is when using the library function setbuf(stdout, buf). Before handling control back to the OS, the library would flush(not free) the remaining in the buf. This happens just after the main function returns generally. If this buf is allocated in stack by using arrays, this last flush would be wrong since the memory has been freed. The solution is to use a global array or a static array. What if the buf in the heap?

Tuesday, July 17, 2007

Pointer

* Data Type For Pointer
In general, a pointer is of UNSIGNED LONG int, which means the pointer has sizeof(unsigned long) bytes.

* Initialize Pointers First
Pointers should be initialized after declaration, with &, malloc or NULL. Keep in mind this rule when deal with multi-dimension pointers. The following is a typical example:
char *p = "Hello World";
char **pp; /* No initialization here */
*pp = p;

The usage of **p in this example is a little bit special:
void GetMem(char **p, size_t num)
{
     *p = (char *)malloc(num * sizeof(char));
}
The caller must initialize p first with &, then pass it to this subroutine. No way to guarantee that p is valid within the subroutine. Otherwise the result of dereferencing p is undefined.

Another way to initialize pointers is:
int *p = (int *)0x12345678;
*p = 1; == *((int * const)0x12345678) = 1;

* Check Pointers Before Use Them
Dereferencing a NULL pointer or an illegal but not NULL pointer is undefined. Therefore checking if it is NULL is always the first thing for pointers as function input parameters or return value.

* The Precedence of Symbol `*'
The symbol `*' has lower precedence than postfix operators, like `.', ->, (), [], ++, --, etc. Keep this in mind to understand the correct meaning of some expressions:
*p.f, *a[0], *fp(), *p++

* Pointer Operations
- Subtraction is meaningful for two pointers while addition is not when both are doing with the same memory region, like array or buffer.
- When one constant is added/subtracted to a pointer, the resulting address is not just the addition/subtraction of them. It depends on the type of pointer and is calculated by the compiler.

* Operator []
When use a pointer to an array or memory block(allocated with malloc), p[i] and p+i is totally different. p[i] is dereferenced and not a pointer any more.

* Exchangeable Pointer and Array Name
As mentioned before, the array name actually is a pointer of data type of array's elements for most of time. So a pointer could be one to a single variable, to an array or to a memory region.
    int a[3][5];
    int (*p)[5];
    p = a; =>
    p[2][5] == a[2][5]
On the other hand, be careful about this:
    int a[3];
    int (*p)[3];
    p = &a; =>
    (*p)[2] == a[2]

* Print Address
    printf("The value of the pointer is %p\n", (void *)p);

Sunday, July 15, 2007

Buffer Pointers

* The Indexes In Array
As we know, the first index of array in C is 0 and the last one is N-1. It seems not good but actually it is very convenient on the calculation of array length without the off-by-one error IN PROGRAMMING. The basic idea is that zero is the first valid index in array while N is the first invalid index after array, thus the length of this array would be equal to (N-0). So it always looks like:
  for (int i=0; i< N; ++i)

* Buffer Pointers
We could apply the same idea to the definitions of buffer pointers. The size of buffer is N and the pointer to buffer is buf. Furthermore we need one pointer, say bufptr, to indicate the usage of this buffer. It is better off to set bufptr to the first unused place. Thus the length of used region is (bufptr-buf) and that of the unused is (N-(bufptr-buf)). The buffer could be filled in simply by *bufptr++. The full condition could be checked by ((bufptr-buf) == N).

This is the typical example with this technique. Pay attention to the updates of pointers:

void bufwrite(char *p, int n)
{
  while (n > 0)
  {
        int k, rem;
        if (bufptr-buf == N)
        {
              flushbuffer();
        }
        rem = N - (bufptr - buf);
        k = n > rem ? rem : n;
        memcpy(bufptr, p, k);
        bufptr += k;
        p += k;
        n -= k;
  }
}

Arrays

* Array in C
- No way to process an array as a whole unit in C.
- Only one dimension array exists. And the size of the array must be known in compilation.
- Arrays could not be copied and compared with each other directly.
- Arrays could not be returned from functions.
- Two things could be done to an array: to get the size of the array and to get the pointer to the first element of this array. All other operation on arrays would be done with this pointer.
- Two operators with the array name would do with the whole array: & and sizeof. & would get the address of this array, i.e. it is supposed to assign one pointer to the array. The size of array in byte would be obtained with sizeof. Another situation is in printf, which prints the whole string with the name of the array/pointer(NO *) and `%s'. For all other cases, the array name means the CONST pointer to the first element of the array. But when the array name is used as a function argument, the const property is lost. Changes are allowed on it in this case.
- The Nth element of an array does not exist. But its address could be used to do some checking, like
    if (p != &array[N]) p++;
- Multi-dimensional array
Note in C only one dimensional array exists. Multi-dimensional array is of linear memory mapping. Another way to get multi-dimensional array is to use multi-dimensional pointer and dynamic allocation. Note in this way the memory mapping might not be linear unless the total size of multi-dimensional array is allocated in the beginning.

Preprocessing Notes

* Scope of Macros
- Until #undef
- Or it is global static in current module
- Or it is global when included

* No Semicolon After Preprocessors

* Be cautious of the space in Macros
This example might not work as you expect:
#define max_(a, b) ((a) > (b) ? (a) : (b)) /* _ represents one space here */

* Arithmetic Calculation Is Allowed in Macro
#define SEC_PER_YEAR (60*60*24*365)UL

* Macros Are Not Functions
- Parenthese usage (the whole Macro and its oprands)
- Avoid evaluating operands more than once, like no such operators in operands, ++, --, etc.

* Macros Are Not Type Definitions
When declaring a new name for pointers, it is better off to use typedef instead of macros.

It is ok to
#define FOOTYPE struct foo
FOOTYPE a, b;

But this is not
#define FOOTYPE struct foo *
FOOTYPE a, b;

Meanwhile, this is allowed for macro but not for typedef:
#define FOOTYPE int
typedef int DOOTYPE;
unsigned FOOTYPE a;
unsigned DOOTYPE d; /* error */

* How to Define a Macro as An Expression?
It is desirous that macros could be used as a single statement and put a semicolon after them. A general solution for this is to use this structure:
do {\
    ...;\
    ...;\ /* No `break;' inside */
    ...;\
} while(0)    /* No semicolon here */
Note: the new line is supposed to begin right after symbol \

A special case for this is if-else. The ?: and || could be used for simplicity due to its specific evaluation order of operands, like the definition of assert macro:
#define assert(exp) ((exp) || _assert_())
Here _assert_() is used to report errors.

* String in Macros
- #define str(s) (b = #s) =>
    str(Hello); == b = "Hello";

* Function Parameters in Macro
- #define Print(x) (printf x) =>
    print(("hello")); == printf("hello");

* ##
For example,
#define bwMCDR2_ADDRESS 4
#define bsMCDR2_ADDRESS 17
#define bmMCDR2_ADDRESS BIT_MASK(MCDR2_ADDRESS)
#define BIT_MASK(__bf) (((1U << (bw ## __bf)) - 1) << (bs ## __bf))
#define SET_BITS(__dst, __bf, __val) \
((__dst) = ((__dst) & ~(BIT_MASK(__bf))) | \
(((__val) << (bs ## __bf)) & (BIT_MASK(__bf))))

SET_BITS(MCDR2, MCDR2_ADDRESS, RegisterNumber);

* Misc
- #if, #elif, #else, #ifdef, #ifndef, #endif, #define, #undef
- #ifdef A; #if defined(A)||!defined(B)
- __LINE__(int), __FILE__(char *), __DATE__(char *), __TIME__(char *)

Reference:
Andrew Koenig, "C Traps and Pitfalls", AT&T Bell Lab

Saturday, July 14, 2007

Semantic Notes

* The Interpretation of Numbers
In the occurrence of numbers in C codes, note the default data types of them, since this would affect the conversion of data types in the expression.
- Integer number is of signed int.
- float point number is of double, not float.

* Expression Evaluation Sequence
Only four operators in C, &&, ||, ?: and , specify an order of evaluation. All other operators evaluate their operands in undefined order. The assignment operator does not guanantee the evaluation order. Like y[i] = x[i++];

* The meaning of 0
- It is the NULL pointer with (void *) type and its content can not be referenced.
- It is the binary number of the ending symbol of a string, '\0'.
- It could be casted to the pointer to any data type and perform some calculation, like
    (size_t)((char *)&(((Type *)0)->partx) - (char *)((Type *)0))
- Complement of 0: unsigned int compzero = ~0;

Reference:
Andrew Koenig, "C Traps and Pitfalls", AT&T Bell Lab

Syntactic Notes

* Precedence
Postfix like ++, --, ->, (), [], ., etc have higher precedence than others and are left associative. Note the difference between postfix and prefix ++, --. Both prefix and postfix ones would increment/decrement its operand/variable directly. But prefix ones would return this variable while postfix ones returns the previous value of this variable. If the return value is not used further, like "i++;" and "++i;", prefix ones are preferred for efficiency since it do not need another temporary variable to store the previous value.

To apply the parenthesis to expressions is the best solution for the precedence confusion.

* Typedef/Define Complex Declarations
The key point is to understand the precedence of * and (), [], etc, when you identify the compound data types.
Two steps to declare compound data types by using typedef or #define: First figure out the correct type format; secondly remove the variable name (and the semicolon for #define) from the format and combine it with typedef/#define.
e.g. (*(void (*)())0)() and we could rewrite this as
typedef void (*fp)();
(*(fp)0)();
Try this now:
void (*((*p)[10]))(void);

* Semicolon
A single semicolon means a null statement in C. Two cases are special for semicolon usage.
One is if or while clause(semicolon might not necessary) and the other is the end of a declaration just before a function definition(semicolon is necessary since it might confuse the return type of the function).

* The Switch Statement
Keep in mind the weakness and strength of "break" in switch.

* The else Problem
The "else" is always associated with the closest unmatched if. The solution is to put curly brackets always even though only one statement is followed.

* Static
- static variables with the scopes only in one function
- static global variables with the scopes only in one file
- static global functions with the scopes only in one file

* +=; -=, *= and /=
The advantage of these operators are left values of them are just evaluated once when these l-values are compounds. For example,
array[i++] += y; =>
array[i] = array[i] + y; i++;
instead of
array[i++] = array[i++] + y;

Reference:
Andrew Koenig, "C Traps and Pitfalls", AT&T Bell Lab

Lexical Notes

* Assignment and Logic Comparison (i.e. = is not ==)
In most cases, the solution is to switch the arguments in comparison, like if (0 == x) instead of if (x == 0). This is invalid if both arguments are variables.

* Bit Operations and Logic Operations (i.e. & is not &&, | is not ||)
& and | treat their arguments as a sequence of bits while && and || does as 'true' or 'false'.
For bit operations, the portable formats:
#define BIT3 (0x1 << 3)
unsigned int a = 0x1;
a |= BIT3; /* Set */
a &= ~BIT3; /* Clear */

* Multi-character Tokens
How to explain these cases?
y = x/*p; z = y+++x; (assuming p is the pointer)
The greedy interpretion rule!

Reference:
Andrew Koenig, "C Traps and Pitfalls", AT&T Bell Lab