C Module 3: Data Structures
Create your repository: https://classroom.github.com/a/QZaPmAGe
Then clone the repository into your CodeAnywhere container using git clone YOUR_URL
(Click the green "Clone or Download" button to find your repository's URL).
Create all the exercise files inside your repository directory and then add, commit, and push them to GitHub when you are finished.
Due: Thursday 9/27 at 11:59PM, see submission instructions.
Objectives
The goal of this module is:
- Understand how data structures (array form, linked) work in C.
As usual, to reduce clutter in the modules, our code examples will not show the "include's". Be sure to include the four in every program you write:
// Put these at the top of every C file: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> int main () { // ... }
Arrays and Strings
Consider this example, which is also available on Repl.
#include <stdio.h> #include <stdbool.h> bool compare(char* s1, char* s2){ // write code here for 3.2 return false; } int main () { char C[4]; char D[4]; char E[4]; char* str = "hello"; char* str2 = "hello"; char* str3 = "oy"; int i; C[0] = 'o'; C[1] = 'y'; C[2] = '!'; C[3] = '\0'; printf("C as chars: "); for (i=0; i<4; i++) { // The usual way. printf ("%c", C[i]); } printf ("\n"); printf ("C as string: %s\n", C); // This works if the string terminator is used. D[0] = 'o'; D[1] = 'y'; D[2] = '!'; D[3] = '\0'; printf ("D as string: %s\n", D); E[0] = 'o'; E[1] = 'h'; E[2] = '?'; E[3] = '\0'; printf ("E as string: %s\n", E); printf ("str as string: %s\n", str); // The usual way, since this is a string. printf("str as chars: "); for (i=0; i<5; i++) { // An alternative (unnecessarily laborious). printf ("%c", str[i]); } printf ("\n"); // Test equality with == (3.1) if(C == D) { printf("C and D are ==\n"); } if(C == E) { printf("C and E are ==\n"); } if(C == str) { printf("C and str are ==\n"); } if(str == str2) { printf("str and str2 are ==\n"); } if(C == str3) { printf("C and str3 are ==\n"); } // Test equality with compare function (3.2) if(compare(C, D) == true) { printf("C and D are == with compare()\n"); } if(compare(C, E) == true) { printf("C and E are == with compare()\n"); } if(compare(C, str) == true) { printf("C and str are == with compare()\n"); } if(compare(str, str2) == true) { printf("str and str2 are == with compare()\n"); } if(compare(C, str3) == true) { printf("C and str3 are == with compare()\n"); } }
Exercise 3.1: Answer these questions in your README. (a) Does the == operator correctly test for equality between strings? (b) Since arrays of chars and char* pointers act similarly, can we test equality by using a line like if(*C == *D)? Test in CodeAnywhere/Repl to verify your answers.
Note:
- C does not conveniently retain the length of an array for you. You need to keep track of it. For strings (char arrays), we denote the end of the string with the '\0' character).
- Char arrays and strings are interchangeable as long as the string terminator symbol '\0' is at the end.
- A char array without brackets is the same as a pointer to a char. Thus think of C as the address for the data held in C[0].
- As we know with any other type of pointer, the "value" of a pointer is the address it points to. Derefencing a pointer follows the address and resolves to a variable based on the pointer's type (in this case a char).
Exercise 3.2: Make a compare.c file in your repository and complete a compare() function to evaluate whether two strings are equal. To do this, you will need to step through each string comparing characters, until you reach the string termination character. Be sure your code works for all the tested cases above, plus any others that might be important.
Arrays of structs
Because structs are intended for related data to be grouped together, it is only natural that we'll want to store a collection of these.
Let's look at an array example first:
struct person { char *name; int age; }; void print (struct person p) { printf ("Name=%s age=%d\n", p.name, p.age); } int main () { struct person list[3]; int i; list[0].name = "R2-D2"; list[0].age = 609; list[1].name = "Optimus Prime"; list[1].age = 2700; list[2].name = "Wall-E"; list[2].age = 210; for (i=0; i < 3; i++) { print (list[i]); } }
HW Exercise 3.3: Implement the above in arrayExample.c.
Note:
- Arrays that are pre-sized (whose size can be determined at compile time) have space set aside for them on the stack.
- Thus, at the end of
main()
the memory picture partly looks like this:
Exercise 3.4: Describe where in the memory diagrams the strings will be stored (you may need to look back to a prior module). Why can't the string data be stored on the stack?
struct person { char *name; int age; }; void print (struct person p) { printf ("Name=%s age=%d\n", p.name, p.age); } int main () { struct person *list; int i; list = (struct person*) malloc (sizeof(struct person) * 3); list[0].name = "R2-D2"; list[0].age = 609; list[1].name = "Optimus Prime"; list[1].age = 2700; list[2].name = "Wall-E"; list[2].age = 210; for (i=0; i<3; i++) { print (list[i]); } }
HW Exercise 3.5: Implement the above in arrayExample2.c.
Note:
- Examine the call to malloc:
list = (struct person*) malloc (sizeof(struct person) * 3);
- On the right side, we're asking malloc for 3 contiguous slots, each of space sizeof(struct person).
- The left side is the cast required to a pointer to the start of the array.
- The variable list. is a pointer to the start address.
- Because malloc allocates memory on the heap, the memory
diagram (partly complete) is:
Observe:
- We have drawn the heap block growing downwards (unlike the stack, which we drew growing up).
- Why do we draw this way? The intention is to make the the stack look like a conventional stack (of plates, say).
- Remember, this is only a conceptual representation of memory.
- Real memory is just one long sequence of bytes.
- Once the space is allocated, standard array notation
with square brackets can be used:
list[0].name = "R2-D2"; list[0].age = 609;
Notice:- Since each array element is a struct, the dot operator is used for a part of the struct.
- The following would be wrong since
list[0]
is not a pointer:
list[0]->name = "R2-D2"; // Will not compile. list[0]->age = 609;
Let's now rewrite, just for illustration, so that the list is a list of pointers:
int main () { struct person **list; int i; list = (struct person**) malloc (sizeof(struct person*) * 3); for (i=0; i<3; i++) { list[i] = (struct person*) malloc (sizeof(struct person)); } list[0]->name = "R2-D2"; list[0]->age = 609; // ... as before ... }
HW Exercise 3.6: Implement the above in arrayExample3.c, completing the code in main(), and including the struct definition and the print() method.
Note:
- This time, we are first asking malloc to allocate
a block of 3 contiguous slots, each large enough to
hold a pointer.
list = (struct person**) malloc (sizeof(struct person*) * 3);
At this moment, memory looks like: - Since each of those pointers need to point to a struct,
we loop through and make each of those point to enough space
(by asking malloc):
for (i=0; i<3; i++) { list[i] = (struct person*) malloc (sizeof(struct person)); // list[i] is a pointer }
After which memory looks like:(This is before the data is assigned into the structs).
Note:- We are calling malloc three times in the loop.
- This means the three allocations can be anywhere on the heap, not necessarily contiguous.
HW Exercise 3.7: Add code to your arrayExample3.c to print the start address of each heap block, and make a table in your README to show a memory diagram. You do not need to list addresses on the stack or globals area.
Linked lists
Let's start with an example (also available on Repl):
struct person { char *name; int age; struct person * next; // We've added a "next" pointer. }; struct person *front=NULL, *rear=NULL; // The usual front,rear pointers. void print (struct person *p) { printf ("Name=%s age=%d\n", p->name, p->age); } struct person* makeNode (char* name, int age) { struct person *p; p = (struct person*) malloc (sizeof(struct person)); p->name = name; p->age = age; p->next = NULL; return p; } void addToList (char* name, int age) { if (front == NULL) { front = makeNode (name, age); rear = front; } else { rear->next = makeNode (name, age); rear = rear->next; } } void printList () { struct person *p; // WRITE YOUR CODE HERE: use the pointer p // to walk down the list. } int main () { addToList ("R2-D2", 609); addToList ("Optimus Prime", 2700); addToList ("Wall-E", 210); printList (); }
Exercise 3.8: Copy/paste the above in linkedlistExample.c, including a complete printList() function, and fill in a complete memory diagram right after the three nodes are added to the list (only include heap addresses - use the debugger to find them).
Note:
- In this example, the standard linked list node is defined as:
struct person { // ... data ... struct person * next; // The "next" pointer. };
- To track the elements of the list, front and rear pointers are declared as global variables so that they are accessible both in addToList() and printList().
Linked list: example 2
struct person { // ... same ... }; struct linkedlist { // We're now packaging the two pointers into a struct. struct person *front; struct person *rear; }; void print (struct person *p) { // ... same ... } struct person* makeNode (char* name, int age) { // ... same ... } struct linkedlist* initList () { struct linkedlist* L; L = (struct linkedlist*) malloc (sizeof(struct linkedlist)); L->front = NULL; L->rear = NULL; return L; } void addToList (struct linkedlist* L, char* name, int age) { if (L->front == NULL) { L->front = makeNode (name, age); L->rear = L->front; } else { L->rear->next = makeNode (name, age); L->rear = L->rear->next; } } void printList (struct linkedlist* L) { struct person *p; // WRITE YOUR CODE HERE } struct linkedlist* copy (struct linkedlist* L) { struct person *p; struct linkedlist *L2; // WRITE YOUR CODE HERE } int main () { struct linkedlist *L1, *L2; L1 = initList (); addToList (L1, "R2-D2", 609); addToList (L1, "Optimus Prime", 2700); addToList (L1, "Wall-E", 210); printList (L1); // Wrong way to copy: L2 = L1; addToList (L2, "Hal-9000", 2); printList (L1); /* // Better: L2 = copy (L1); addToList (L2, "T-1000", 30); printList (L2); */ }
HW Exercise 3.9: Implement the above in linkedlistExample2.c, completing the required methods. For the moment, the copy() method can remain not implemented.
The reason for the variation is now clear: we'd like to make it easy to copy a linked list:
- The wrong way, of course, is just to copy a pointer:
L2 = L1;
- The right way is to create an entire new list.
HW Exercise 3.10: Implement the copy() method, ensuring that a deep copy is made, so that every person struct is duplicated. Un-comment the block comments in main() Add all this code to linkedlistExample2.c.
An array of linked lists
Consider creating an array of linked lists to store structs:
- We'll call the array a "table".
- And we'll think of the structs as the data that needs to be
stored, for example:
struct person { char *name; int age; struct person * next; };
- The table will be of fixed size, for example:
#define TABLE_SIZE 5 // This defines a constant in C struct linkedlist *table[TABLE_SIZE]; // 5 linked lists.
- When a struct is to be stored in a linked-list, we already know how to do that from earlier exercises in this module.
- Thus, the only remaining question is: which of the linked lists (one per entry in the table) do we store a particular item?
- To compute which entry, we'll use a calculation that
returns an integer, as in:
int computeCode (char *name, int age) { int s = 0; int i = 0; while (name[i] != '\0') { // Add the letters, treating them as numbers. s += (int) name[i]; i++; } return (s+age); // Combine with age. }
- Then, to decide which table entry, we'll use the above:
tableEntry = computeCode(name,age) % TABLE_SIZE;
Note:- The computeCode() function (method) returns some integer.
- This integer cannot directly be used as the index into the table because it could be too large.
- So, we compute the remainder when divided by the table size.
- This will always produce a number between 0 and TABLE_SIZE.
- Once we know which table entry (that is, which linked list),
we can use our earlier linked-list methods to insert into that list:
addToList (table[tableEntry], name, age);
Exercise 3.11: Implement the needed code in various methods in linkedlistTable.c to make this idea work. You have seen this data structure before. What is it called? Draw a memory diagram after the first three insertions. You can either use a markdown table, or draw on paper and add a photo to your README file.
Meta
Building data structures in any language requires knowledge of memory, but this is especially important in C:
- Care must be taken about what is on the heap versus the stack. What would happen if the makeNode() function in our Linked List simply created a node on the stack and returned a pointer to it?
- Pointers are frequently used to manipulate data structures since they provide a way for a function to modify the data structure directly (not a copy of the values passed as parameters).
Submission Instructions
Do your work in a repository created using this link.
- Answer all of the exercise questions above in your README.md file.
- Add all of your source code files to your git repository. You can do this by running git add *.c when in the same directory to add all files ending in .c. You will then need to use git commit -m "some message" and git push origin master to commit and push your files. Be sure to run git status to verify that all modified files have been committed and pushed.
- You should check the contents of the README.md file and your other C files through the GitHub website to ensure that it has all of your correct content!