Notes N20

Math 10 -- D. C. Smolarski, S.J.
Santa Clara University, Department of Mathematics and Computer Science

[Return to Math 10 Homepage | Return to Notes Listing Page]

Contents


Introduction to Abstract Data Types: Pointers

"Abstract Data Types" often refer to data structures which are not necessarily "native" to a computer language (such as arrays or structs) and have multiple components. Often such new data structures have to be "built" from the capabilities of a language and many books refer to them as "ADTs."

For some types of information, simple variables, arrays or even records are not quite good enough.

For example, how would you (efficiently) store a genealogy tree?

How would you store a queue (i.e., a "list" where insertions are made at one end and deletions at the other -- like a "waiting" line for service at a bank or for food at Benson)?

The concept of pointers has been developed to handle this.

Unfortunately, dealing with this type of concept and variable demands that we use a lot of pictures and lines to see how things are related.

NOTE: pointers will be introduced in Math 10, but explained only very generally. They will be covered in more depth in Math 61, "Data Structures."

Concepts and Definitions: Part 1 -- Lists

EXAMPLE:
       INFO NEXT    INFO NEXT    INFO NEXT
       _______      _______      _______
----->| 25 | -|--->| 32 | -|--->| 83 | /|
      |____|__|    |____|__|    |____|/_|

  ^       ^     ^     ^      ^     ^
  P      N1     Q     N2     R     N3
SOME TERMINOLOGY: node: several memory locations (i.e., a record) that contains information as well as a "link" to another node. Each field (separate section of the record) is given a name, such as INFO and NEXT in the example. A node is often depicted as a box with several sections. In the example, N1, N2, N3 are the nodes.

pointer: a way to reference a node. Sometimes called the "link" or the "address." The most correct way to think of a pointer is as the memory address (location) of the first section of the node. A pointer is often depicted as an arrow. In the example, P, Q, and R are the pointers.

nil or NULL: an "empty" pointer, that is, a pointer that does not point to any node, usually depicted as a cell with a slash through it.

[C++] notation: The node N1 is actually the node that the pointer P points to. Instead of refering to N1, we usually write *P as referring to "the node P points to." (The * used this way is often called the "dereferencing operator.")

To indicate the INFO section of node N1, we can write (*P).INFO since N1 is actually a record and thus we can use the standard record notation to specific one particular field. However, C/C++ has another operator to simplify the notation (and visually suggest that we are talking about "pointer" variables), and it is the "arrow" operator, ->. Thus (*P).INFO can be equivalently written as P->INFO when needed.

To indicate the NEXT section of node N1, we write P->NEXT and this is the pointer that points to node N2, and we have also called this pointer Q.

linked list: a collection of nodes, with a special head node and a special tail node, such that every other node has some node pointing to it and it points to another node.

The head node has no node pointing to it (e.g., N1 above).

The tail node does not point to any other node (e.g., N3 above).

Concepts and Definitions: Part 2 -- Trees

A linked list is basically a linear structure --- the data can be considered to be stored in a one-dimensional structure.

If we extend the concepts of nodes and pointers slightly, we can develop a two or higher dimensional linked structure.

tree: a collection of nodes, with a root node (i.e., a head node), such that each node points to one or more other nodes. The tail nodes of a tree (that do not point to any other nodes) are called leaves.

binary tree: a tree such that each node point to at most two other nodes. (A node may point to only one other node or no other nodes, but it cannot point to 3 or more nodes in a binary tree.)

FOR CONVENIENCE, we depict a tree "upside down," that is with the root uppermost and the leaves at the bottom.

                 ____|____
                |  | x |  |
                |__|___|__|
               /           \
       ______ /_            _\_______
      |  | y |  |          |  | z |  |
      |__|___|__|          |__|___|__|
     /           \        /           \
The node(s) that a node in a tree point to (on the next level down) are called the "children" (or "sons" or "daughters") of that node. The node that immediately points to a node is the "parent" (or "father" or "mother") node. All other terminology is based on the use of such "trees" for genealogical records. Hence we speak of nodes that are "first-cousins" of other nodes, or nodes that are "grandchildren" of other nodes, or nodes that are "uncles" of other nodes.

For example, given

                     |
                     a
                   /   \
                  /     \
                 b       c
                / \     / \
               d   e   f   g
Node a is the grandparent of nodes d, e, f, and g. Node b is the parent of nodes d and e. Nodes b and c are brother and sister. Nodes e and f are first cousins. Node b is the aunt of node f. Similarly for more nodes and more relationships.

EXAMPLE: Trees can be useful to decipher arithmetic expression for translation by compilers into appropriate code, thereby eliminating the need for parentheses.

E.g., (3 + 2)(5 - 2) + 4

                      |
                      +
                    /   \
                   /     \
                  x       4
                /   \
               /     \
              +       -
             / \     / \
            3   2   5   2
NOTE: that the leaves are all the numbers and the other ("internal") nodes are all operators. To use this tree for arithmetic computations, one begins at the (leftmost) lowest leaves and picks a set of one parent and its children (e.g., + with 3 and 2). One computes the answer and "prunes" and "reduces" the tree (i.e., "snips" the tree at that "branch," i.e. right at the + and replaces it with 3+2=5).

This is repeated until the root node is replaced with an answer.

NOTE: Searching and sorting can also be done on trees. This is a major section of a Data Structures course.

Defining Pointer Types in C/C++

C/C++ allows the use of pointer variables. To make use of pointers in a program, one normally first defines a new related type: the node that will be pointed to by pointer variables. (Technically, pointer variables can be used in other contexts -- and don't always have to point to a structured node.)

Before explaining the legal syntax for defining such pointer variable and nodes, it is helpful first to understand certain basic concepts and how C/C++ relates to those concepts.

When variable identifiers such as i and j are used in the "standard" way, for example, in i = j+2, we understand that i and j refer not to the actual memory location associated with these symbolic identifiers, but rather to the values stored in these memory locations.

Thus i = j+2 means "take the value in the memory location correspoding to j, add 2 to it, and store that new value in the memory location corresponding to i."

There are times when we don't want to concern ourselves with the value corresponding to an identifier, but keep track of where the identifier is located. We do this by using the ampersand, &, as when we want a function parameter to be a variable parameter. Using the ampersand makes the code pass the address of the memory location rather than the contents (value) stored there.

Thus, if i corresponds to memory location 2307, using i by itself in an expression means that the computer looks at the contents of location 2307 (i.e., the value stored there). On the other hand, &i means that the computer looks at 2307, which is the exact memory location, rather than at what is stored in 2307.

"Pointer" variables are variables that store addresses, in other words, the values of pointer variables are the addresses in which the information is stored rather than the contents of that address itself. A pointer variable is like a sign that "points" to the real memory location. It is like a "for sale" sign that stands on a corner and tells you what the address of the house is that is up for sale. The sign is not the house. The sign is not at the address of the house. But the sign tells you where to go to find the house -- it is a "reference" for the house. To go from the "reference" (pointer) variable to the actual location where the data is stored, we need to "dereference" the identifier, and this is done by using the asterisk -- usually <shift>8.

Thus if we declare p as a pointer variable, we must remember that what is stored in that variable is the address of where we eventually look to find values. To get to the actual memory location, we write *p.

When constructing a node, as illustrated earlier, we usually have a section of the node to store some sort of data, and another section which stores the address of the next node. Thus one section of a typical node is itself a pointer variable.

The following example is a typical way to define a node structure datatype (i.e., a class):

  class node {
	public:
	   int info;
	   node *temp;
	};
NOTES:
  1. The last declaration line, node *temp; is the standard way in which a pointer variable is declared. What this format means is that temp is being declared NOT as a variable of type node (in the way that info is being declared to be of type int), but rather as a POINTER variable which points to (i.e., stores the address of) a set of memory locations which we consider to be of data type node.
  2. It may be necessary to define two node types in a specific program (e.g., one a list node, another a tree node). Every pointer should be associated and declared relative to its own node type.

Using Pointer Variables

We can declare pointer variables (after first defining a type), as we do for any variable (with intrinsic or user-defined types), and as was shown above. A variable that points to memory locations of a predefined data type is listed with an asterisk prepended to it and after the data type.

E.g.,

       node *list;
This is interpreted as saying that list is a pointer variable (indicated because of the presence of the *) and that the memory locations that list points to is of type node. Then, when using such a variable, we reference the object pointed to by using an asterisk before the identifer as in the declaration section, for example,
       *list
Often this is read and interpreted as "what [i.e., the node that] list points to."

To reference the info section of this node, we can use the "dot" notation as in a typical record variable, for example,

        (*list).info
or, we can use the "arrow" operator instead, for example,
	list->info
Both notations have the identical same meaning.

Allocating Space for Nodes

In normal programming practice, we don't allocate all the space for nodes referenced by pointer variables at the beginning of a program, that is, when declaring variables. Also, in standard practice, we do not give each node a separate name.

Instead, we allocate space dynamically, i.e., as needed and while the program itself is running, and usually we only have a reference to the head node of a list, or the root of a tree.

To allocate space for a node, we use the standard (intrinsic) C++ operator new, which takes one argument. (C allocated space for new nodes using a different library function alloc or malloc with another library function sizeof.)

The argument of new is of the datatype desired and follows the keyword new but is not enclosed in parentheses. new is used in a simple assignment statement with a pointer variable on the left side and allocates space in memory for a new node of the proper type pointed to by the variable named and associated that space with the variable listed on the left side of the assignment statement. In other words, if p were declared to be a pointer variable (pointing to a node), then

	p = new node;
would actually create the node and store the address of the new node in p.

CLARIFYING NOTE: One declares variables to be of pointer types at the beginning of a program. But these variable only set aside space for "pointers," that is, for the addresses of nodes. Declaring such a space for an address does not itself provide any address or any "structure" at an address. It is like a post office providing space for new street addresses, if and when open territory is subdivided and homes are actually built. Just because the post office has a slot allocated and available for putting a new address, doesn't mean that there actually is open property available and a house built on it.

Example

       info next    info next    info next
       _______      _______      _______
----->|    | -|-   |    | -|-   |    | -|-
      |____|__|    |____|__|    |____|__|
 ^
 P

  class node {
	public:
	   int info;
	   node *next;
	};
   node *p, *q, *r;
	// these variables CAN hold addresses
        // of nodes, but they initially do NOT,
        // and there are no nodes
        // corresponding to them 
 ...
   p = new node;
       // this allocates a node corresponding 
       // to  p  and puts
       // the proper address in  p
   p->info = 2;
       // the  info  section of the node 
       // that  p  points to 
       // gets the value of 2 
   q = new node;
   p->next = q;
   q->info = 5;
   r = new node;
   q->next = r;
   r->info = 8;
   r->next = NULL;
   q = new node;
   r->next = q;
   q->info = 12;
   q->next = NULL;
   ...
   }
It may be helpful to recall what "really" happens in memory when trying to understand this example. The identifiers p, q, and r correspond to locations in memory which can store units of information. Unlike variables we have seen thus far (which have stored numbers or characters), these "pointer" variables store addresses of other memory locations.

Remember that all memory is merely a sequentially numbered electronic storage "warehouse" and that the "symbol table" enables a program to make a correspondence between an identifier and the memory location it actually corresponds to. Imagine the following as part of a symbol table:

	identifier |	memory
		   | 	location
		   |
	   p	   |	2156
	   q	   |	2157
	   r	   |	2158
Imagine the following as depicting sections of memory.
     location   |     contents
	2156	|	____
	2157	|	____
	2158	|	____
	2159	|	____
	2160	|	____
	...
	3214	|	____
	3215	|	____
	...
	3387	|	____
	3388	|	____
	3389	|	____
	3390	|	____
When the program segment above starts "running," space is allocated for pointer variables p, q, and r but NO nodes are created for them to point to. Only when the new operator is invoked on datatype node is space actually allocated and the corresponding pointer variable gets a value, which is the address of the first item in the node. For example, when
	p = new node;
is executed, then we can imagine that two memory spaces are allocated for the node p points to, for example locations 3214 and 3215. (Where the "computer" goes to get space is beyond the scope of this course. The point of importance is that the "computer" can get space to allocate and that where the space is, is not under the control of the programmer.) Under this assumption, the memory location that corresponds to p, i.e., 2156, now receives the value of 3214, which is the first part of the new node allocated. When the next line in the code
	p-> = 2;
is executed, then memory location 3214 receives the value of 2.

The following is a hypothetical final version of the list created by the program code above, with the numbers above the boxes indicating the corresponding memory locations which are also depicted in a "filled-in" version of computer memory corresponding to what was depicted above. (Once again, note that the memory locations corresponding to the nodes were arbitrarily chosen for the sake of this example.)

       3214 3215    3389 3390    3387 3388   2159 2160
p      _________    _________    _________    _______
----->|    |   -|->|    |   -|->|    |   -|->|    |  |
      |  2 |3389|  |  5 |3387|  |  8 |2159|  | 12 | 0|
      |____|____|  |____|____|  |____|____|  |____|__|
This is the revised depiction of computer "memory" with integer and pointer values inserted.
     location   |     contents
	2156	|	3214
	2157	|	2159
	2158	|	3387
	2159	|	_12_
	2160	|	__0_
	...
	3214	|	__2_
	3215	|	3389
	...
	3387	|	__8_
	3388	|	2159
	3389	|	__5_
	3390	|	3387

Recursion and List Generation

As concrete examples of recursive programs, we now look at a sample program to generate a list and then print it out.

Those in their right minds would never write a program merely to make a list of decreasing numbers -- lists are usually used for much more significant (and constantly changing) data than that. But this code does exemplify certain programming methods.

In writing programs of this sort, one needs to think in terms of the following general rule: How do I write the code for a case of size n assuming that I let the code automatically take care of the case of size n-1? A simple algebraic example is that we can write an as a(an-1). Here we assume we can find an-1. We use this assumption to compute an.

This approach is sometimes called "Divide and Conquer."

A similar dynamic is true with other recursive programs, in particular, the one below. We assume we can let the program create a list of size n-1. We use this fact to build a list of size n.

       _______      _______      _______
----->|    | -|--->|    | -|--->|    | -|
      |____|__|    |____|__|    |____|__|


#include<iostream.h>
//#include <stdlib.h>  for exit,NULL,
//                     if iostream not included
class node {
 public:
  int info;
  node * next;
  };

//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
//*       F U N C T I O N   G E N L I S T                   *
//  This generates a list.  n is a small positive integer
//  specifying an upper bound on the number of nodes
//  of the list.  The information field is 
//*  given a sequential node number.                        *
//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
node * genlist(int n, int & numnode)
{	node *temp;
	temp = new node;
        if (n==1)
	   temp->next = NULL;
	else
	   temp->next = genlist(n-1,numnode);
	temp->info = ++numnode; //numnode is the unique number
		     //stored in each node after its creation
	return(temp);
}

//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
//*       F U N C T I O N   L I S T P R I N T
//  This prints a list by iteratively moving a temporary
//  pointer through all the nodes of the list and printing
//*  out the value of the info section of each node.
//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
void listprint(node * list)
{	node * temp;
	temp = list;
	do
	{	cout << temp->info << " " ;
		temp = temp->next;}
	while (!(temp == NULL));
}

//***************************************************
//  MAIN PROGRAM
//***************************************************
int main ()
  {
	node * list;
	int numnode; // numnode is a sequential counter
		//used to put data in the info 
		//section of each node.
	numnode = 0;
	list = genlist(5,numnode);
	listprint(list);
	return 0;
}

Trees and Traversals, i.e., "Visiting" Nodes

A tree is a two-dimensional structure, unlike a linked list. As with 2 dimensional arrays, there is a possibility of "listing" or "visiting" each storage cell (node) in more than one way.
                     |
                     a
                   /   \
                  /     \
                 b       c
                / \     / \
               d   e   f   g
There are THREE standard ways to "traverse" a binary tree. The definition is recursive, and is defined for a parent and its two children.
INORDER      left child, Parent, right child: L P R
PREORDER     P L R
POSTORDER    L R P
NOTE THAT THE CHILDREN ARE ALWAYS LISTED LEFT TO RIGHT.

For the example tree above, the traversals are

     INORDER   __ __ __ __ __ __ __

     PREORDER  __ __ __ __ __ __ __

     POSTORDER __ __ __ __ __ __ __
AS A QUICK CHECK, note where the ROOT node is listed in each traversal (it should correspond to the prefix: IN PRE POST relative to the other nodes).

Algebraic expression example:

                     |
                     +
                   /   \
                  /     \
                 /       *
                / \     / \
               a   b   c   d

   INORDER:  a / b + c * d ==>  algebraic notation (w/o parentheses)
   PREORDER: + / a b * c d ==>  functional notation (w/o parentheses)
   POSTORDER: a b / c d * + ==> Reverse Polish Notation (RPN)

Preorder would look better if we wrote: +(/(a,b),*(c,d)) or
                                        f(g(a,b),h(c,d))
Postorder is the order used in "RPN" calculators.

Traversing Unbalanced Trees

Suppose some "parent" nodes in a binary tree are missing one of its offspring. The traversal rules apply as usual, except that one merely omits listing the missing node.
              |          INORDER:   __ __ __ __ __
              a
            /   \
           /     \       PREORDER:  __ __ __ __ __
          b       c
         /         \
        d           e    POSTORDER: __ __ __ __ __


Sample Program -- MP10A

The sample program mp10a.cxx (available by clicking on the link or via my MA10 directory) has two major routines:

GENTREE and TREEPRINT.

GENTREE is a function that GENerates a TREE of less than n nodes where n is the input parameter. (If n is a power of 2, then the tree will have n-1 nodes in it. If n is not a power of 2, then the tree will have m-1 nodes in it, where m is the greatest number that is a power of 2 that is less than n.)

GENTREE constructs a tree by building nodes on the left side first (until it reaches the maximum level possible), and then filling in nodes on the right sides.

TREEPRINT is a function that prints out a tree in two dimensions (rotated 90 degrees counter-clockwise). It prints out the nodes with proper spacing to indicate the depth.

In reality, TREEPRINT is printing the tree out in a non-standard order.

(Which? --> ______________________________)


This page is maintained by Dennis C. Smolarski, S.J. dsmolarski@math.scu.edu
© Copyright 1997, 1998, Dennis C. Smolarski, SJ, All rights reserved. Last changed: 5 December 1998.