C Language / Lesson 10 Back to Index or to Previous Page or to Next Page

 

 

Arrays of pointers vs. Arrays of Strings

Let's first remember the syntax rules for initializing an array of strings (officially: a two dimension char aggregate)

Here is an example of an array containing the names of the weekdays :
char weekday[7][10] = {

"Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"

} ;

( Note that the [7] could be written as [] since the array is initialized and the number of strings in the initialization section can be counted by the compiler. )

Now let's see how memory is allocated by the above definition :
S u n d a y \0      
M o n d a y \0      
T u e s d a y \0    
W e d n e s d a y \0
T h u r s d a y \0  
F r i d a y \0      
S a t u r d a y \0  

This is a 7x10 frame i.e. a total of 70 bytes. We observe that some of these bytes are not used (they contain garbage) but in the above example that doesn't really matter. The table is relatively small and the total number of unused bytes is only 13. This as an absolute number is small but also the percentage of unused bytes compaired to the total size of the allocated memory block ( 13/70) is also small . There are two factors that can worsen the memory leak of the string array implementation :
the diverse length of the string values in the array
the number of elements of the array

In the following example it is obvious that the waste of memory is significant
char message[10][47] = {

"Bye",
"Do you want to permanently delete these files?" ,
"Illegal file name",
"OK",
"Please wait...",
"Press a key to continue",
"Quit ?",
"Save file?",
"Try again",
"Variable not declared"

} ;

Only 161 bytes out of 470 (10x47) are used in the above array definition. And 470-161= 309 wasted bytes on a total of 470 is significant. Almost double the size of the used memory is not used.

Anyway, let's use the array

Let's write a few lines of code to display one of the strings in the array

void display_message(int msg_no)
{ fprintf( stderr, "%s\n", message[msg_no] );
}

message[msg_no] is the address of the respective string because the defined variable is a two dimension aggregate. You must keep in your mind that given a multi-dimension aggregate, an expression with less pairs of square brackets that the number of dimensions is the address and not the value of an element in the aggregate.In our example message[msg_no] is an expression with one pair of brackets while the definition defines a two-dimension variable.

We will re-visit the above example under a different approach after one or two paragraphs.

 

Using a pointer array

A pointer array contains pointers of course. In other words it contains a list of addresses and in the case of a char * array , the addresses point to strings. If the pointer array is initialized to string constants then the addresses of the pointer elements of the array are addresses of constants. In other words the strings do not belong in the array but instead they are stored somewhere in the constants memory segment whereas the array contains only the pointers. Assuming that a pointer is a 32-bit value then a pointers size is 4 bytes. Therefore the size of an array of 7 pointers is 4x7=28 .

Here is the impementation of the dayweek definition , this time using pointer instead of strings.

char *weekday[7] = {

"Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday

} ;

Doesn't it look like the definition of the array of string at tthe top of this paragraph ?

We have not finished yet. While the table is a 28 byte memory block this is not the only memory allocated. There's a an additional allocation of memory for the string constants. This time we count the characters (and the terminators) of each dayname in the initialization section . Total = 57 . And maybe this is less than the size of the 2 dimensional definition 7x10 but there is the additional 28 bytes of the array. Total 28+57 = 85. Conclusion: we have allocated more memory with this second approach using pointer arrays.

If we repeat this last process for the message[ ] array , the results will be quite different.

char *message[10] allocates 4x10=40 bytes
the total size of the string constants in the definition is 161
  Result : Total memory used = 161+40 = 201 only !

We have come to the conclusion that the more the string values are of uniform size the stronger is the posibility that an array of strings is the best choice in therms of space optimization and in the opposite case an array of pointers is the choice.

But do we have to modify our code because we have chosen the latter ? Thank God , No !

Because the notation we used before in the display_message(...) example this time does the same job although it means different thinks. The message[msg_no] this time is not the address of the element but the value (according to the rules presented above regarding brackets). This time however the value of the element is a pointer : the address of the required string. So the result is the same. This means that in case we have implemented a big project using a multi-dimensional array definition with a significant number of references to it in our code and later we change our mind and wish to turn it into a pointer array, there will be no problem ; all references to the elements of the aray will be correct.

 

Terminating Arrays

In the weekday[] example above the size of the array (regardless of the implementation method that one would choose) is fixed and known in advance, because the days of the week are (and will always be) seven. If a function is to display these daynames on the screen it would look like this :
void display_days()
{ int i ;
   char *weekday[7] = {

"Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday" } ;

for (i=0 ; i<7 ; ++i )
   printf("%s\n",weekday[i] ) ;

}

As we can see, the size is passed in the code as a constant (or it could be a symbolic constant if you prefer) and the program 'knows' where to stop.

But what would happen if an array contained a number of elements that changes through the time as the project expands and new versions are produced etc. For instance, let an array contain a list of imperative words i.e. commands which make up an instruction set of a special protocol :
char *cmd[] = {

"BEGIN", "CLOSE", "ERASE", "FIND",
"LOAD", "OPEN", "PRINT" , "RESET",
// etc. etc.
"WAIT", "WRITE"

} ;

Here the advantages of leaving the square bracket block empty is that a. we don't have to count the number of words in the array and b. whenever we add a new word there is no change to be done in the declaration of the array because the size of the dimension is omitted and automatically calculated at compilation time.
But can we display the contents of the array without counting the number of elements in it ? If we used the style of the previous for (. . .) loop, then we should mention explicitly the number of iterations that the loop must execute. Which already is the loss of the advantage of not counting the number of elements and what is worse we must change the constant number whenever we add a new element in the array.
All this discussion leads to the conclusion that another way of determining the end of the array is needed. Let's add then a terminator to the array exactly as C does with strings (arrays of characters). When a string is processed by a string function the characters in it are accessed one by one until a Null character ( '\0' ) is found.

So, if character arrays are terminated with a character (the null character) , then what can terminate an array of pointers is obviously a pointer : the NULL pointer. Remember that NULL is a pre-processed name defined usually in the stdio.h file and it represents the address 0000 which is never used to store anything in a C program and thus it can represent an unsuccesful access or a negative response. For example if the program asks a function 'where is that string stored?' and the function answers 'it is stored at NULL' then it means 'it was not found' . In a similar way including the NULL pointer in a list of string addresses it can never be confused with the address of an existing string. Let's rewrite the above definition :
char *cmd[] = {

"BEGIN", "CLOSE", "ERASE", "FIND",
"LOAD", "OPEN", "PRINT" , "RESET",
// etc. etc.
"WAIT", "WRITE" , NULL

} ;

The following function will search in the array and return the order number in the list of a word we are looking for :
int
cmd_index( char *word)
{ int i ;

for (i=0 ; cmd[i] ; ++i )
   if ( strcmp(word,cmd[i])==0)
      return i ;

return -1 ; // Not found
}

The cmd[i] condition is equivalent to cmd[i]!=NULL in other words the loop keeps executing while the current pointer - element in the cmd array is not NULL and when the NULL is reached the loop terminates.

How nice but what if I suddenly decide that because of the uniform size of the strings in the array I wish to change my definition from
char *cmd[] = { . . . } ;
to
char cmd[][6] = { . . . } ;

In everyday practice some compilers would accept this without any modification. But if we want to say things with their real names, the NULL has not place in the list of initial value of the array cmd[ ] in the latter definition because it is not a string but a pointer and this array is an array of strings and not an array of pointers as it was in the former impementation.So, the question is : which string shall I choose to terminate a list of strings ? There could be many different answers to this but in my opinion only one answer is reasonable : the empty string "" .

Is there anything else to be changed ? Yes !

The condition cmd[i] mentioned before is not appropriate anymore because cmd[i] this time is the address of a string in the array of strings and therefore it will never be NULL . Below is the final impementation with the usage of an array of strings after all the required modifications have been done.

char cmd[][6] = {

"BEGIN", "CLOSE", "ERASE", "FIND",
"LOAD", "OPEN", "PRINT" , "RESET",
// etc. etc.
"WAIT", "WRITE" , ""

} ;
int
cmd_index( char *word)
{ int i ;

for (i=0 ; cmd[i][0] ; ++i )
   if ( strcmp(word,cmd[i])==0)
      return i ;

return -1 ; // Not found
}


Now the condition cmd[i][0] 'says' : 'while the first character of the current string is not the Null character '\0' ...' or in other words 'while the current string is not the empty string "" ...'.

Previous page or Next page or Back to Index