structs) and have multiple components. Often such new
data structures have to be "built" from the capabilities of a
language and many books refer to them as "ADTs."
For some types of information, simple variables, arrays or even records are not quite good enough.
For example, how would you (efficiently) store a genealogy tree?
How would you store a queue (i.e., a "list" where insertions are made at one end and deletions at the other -- like a "waiting" line for service at a bank or for food at Benson)?
The concept of pointers has been developed to handle this.
Unfortunately, dealing with this type of concept and variable demands that we use a lot of pictures and lines to see how things are related.
NOTE: pointers will be introduced in Math 10, but explained only very generally. They will be covered in more depth in Math 61, "Data Structures."
INFO NEXT INFO NEXT INFO NEXT
_______ _______ _______
----->| 25 | -|--->| 32 | -|--->| 83 | /|
|____|__| |____|__| |____|/_|
^ ^ ^ ^ ^ ^
P N1 Q N2 R N3
SOME TERMINOLOGY:
node: several memory locations (i.e., a record) that
contains information as well as a "link" to another node. Each
field (separate section of the record) is given a name, such as
INFO and NEXT in the example. A node
is often depicted as a box with several sections. In the example,
N1, N2, N3 are the nodes.
pointer: a way to reference a node. Sometimes called the "link" or the "address." The most correct way to think of a pointer is as the memory address (location) of the first section of the node. A pointer is often depicted as an arrow. In the example, P, Q, and R are the pointers.
nil or NULL: an "empty" pointer, that is, a pointer that does not point to any node, usually depicted as a cell with a slash through it.
[C++] notation: The node N1 is actually the node that the pointer P points to. Instead of refering to N1, we usually write *P as referring to "the node P points to." (The * used this way is often called the "dereferencing operator.")
To indicate the INFO section of node N1, we can write (*P).INFO since N1 is actually a record and thus we can use the standard record notation to specific one particular field. However, C/C++ has another operator to simplify the notation (and visually suggest that we are talking about "pointer" variables), and it is the "arrow" operator, ->. Thus (*P).INFO can be equivalently written as P->INFO when needed.
To indicate the NEXT section of node N1, we write P->NEXT and this is the pointer that points to node N2, and we have also called this pointer Q.
linked list: a collection of nodes, with a special head node and a special tail node, such that every other node has some node pointing to it and it points to another node.
The head node has no node pointing to it (e.g., N1 above).
The tail node does not point to any other node (e.g., N3 above).
If we extend the concepts of nodes and pointers slightly, we can develop a two or higher dimensional linked structure.
tree: a collection of nodes, with a root node (i.e., a head node), such that each node points to one or more other nodes. The tail nodes of a tree (that do not point to any other nodes) are called leaves.
binary tree: a tree such that each node point to at most two other nodes. (A node may point to only one other node or no other nodes, but it cannot point to 3 or more nodes in a binary tree.)
FOR CONVENIENCE, we depict a tree "upside down," that is with the root uppermost and the leaves at the bottom.
____|____
| | x | |
|__|___|__|
/ \
______ /_ _\_______
| | y | | | | z | |
|__|___|__| |__|___|__|
/ \ / \
The node(s) that a node in a tree point to (on the next level
down) are called the "children" (or "sons" or "daughters") of
that node. The node that immediately points to a node is the
"parent" (or "father" or "mother") node. All other terminology
is based on the use of such "trees" for genealogical records.
Hence we speak of nodes that are "first-cousins" of other
nodes, or nodes that are "grandchildren" of other nodes, or nodes
that are "uncles" of other nodes.
For example, given
|
a
/ \
/ \
b c
/ \ / \
d e f g
Node a is the grandparent of nodes d, e, f, and g. Node b is the
parent of nodes d and e. Nodes b and c are brother and sister.
Nodes e and f are first cousins. Node b is the aunt of node f.
Similarly for more nodes and more relationships.
EXAMPLE: Trees can be useful to decipher arithmetic expression for translation by compilers into appropriate code, thereby eliminating the need for parentheses.
E.g., (3 + 2)(5 - 2) + 4
|
+
/ \
/ \
x 4
/ \
/ \
+ -
/ \ / \
3 2 5 2
NOTE: that the leaves are all the numbers and the other
("internal") nodes are all operators. To use this tree for
arithmetic computations, one begins at the (leftmost) lowest
leaves and picks a set of one parent and its children (e.g., +
with 3 and 2). One computes the answer and "prunes" and "reduces"
the tree (i.e., "snips" the tree at that "branch," i.e. right at
the + and replaces it with 3+2=5).
This is repeated until the root node is replaced with an answer.
NOTE: Searching and sorting can also be done on trees. This is a major section of a Data Structures course.
Before explaining the legal syntax for defining such pointer variable and nodes, it is helpful first to understand certain basic concepts and how C/C++ relates to those concepts.
When variable identifiers such as i and j are used in the "standard" way, for example, in i = j+2, we understand that i and j refer not to the actual memory location associated with these symbolic identifiers, but rather to the values stored in these memory locations.
Thus i = j+2 means "take the value in the memory location correspoding to j, add 2 to it, and store that new value in the memory location corresponding to i."
There are times when we don't want to concern ourselves with the value corresponding to an identifier, but keep track of where the identifier is located. We do this by using the ampersand, &, as when we want a function parameter to be a variable parameter. Using the ampersand makes the code pass the address of the memory location rather than the contents (value) stored there.
Thus, if i corresponds to memory location 2307, using i by itself in an expression means that the computer looks at the contents of location 2307 (i.e., the value stored there). On the other hand, &i means that the computer looks at 2307, which is the exact memory location, rather than at what is stored in 2307.
"Pointer" variables are variables that store addresses, in other words, the values of pointer variables are the addresses in which the information is stored rather than the contents of that address itself. A pointer variable is like a sign that "points" to the real memory location. It is like a "for sale" sign that stands on a corner and tells you what the address of the house is that is up for sale. The sign is not the house. The sign is not at the address of the house. But the sign tells you where to go to find the house -- it is a "reference" for the house. To go from the "reference" (pointer) variable to the actual location where the data is stored, we need to "dereference" the identifier, and this is done by using the asterisk -- usually <shift>8.
Thus if we declare p as a pointer variable, we must remember that what is stored in that variable is the address of where we eventually look to find values. To get to the actual memory location, we write *p.
When constructing a node, as illustrated earlier, we usually have a section of the node to store some sort of data, and another section which stores the address of the next node. Thus one section of a typical node is itself a pointer variable.
The following example is a typical way to define a node structure datatype (i.e., a class):
class node {
public:
int info;
node *temp;
};
NOTES:
E.g.,
node *list;
This is interpreted as saying that list is a pointer variable
(indicated because of the presence of the *) and that the memory
locations that list points to is of type node.
Then, when using such a variable, we reference the object pointed to
by using an asterisk before the identifer as in the declaration
section, for example,
*list
Often this is read and interpreted as "what
[i.e., the node that] list points to."
To reference the info section of this node, we can use the "dot" notation as in a typical record variable, for example,
(*list).info
or, we can use the "arrow" operator instead, for example,
list->infoBoth notations have the identical same meaning.
Instead, we allocate space dynamically, i.e., as needed and while the program itself is running, and usually we only have a reference to the head node of a list, or the root of a tree.
To allocate space for a node, we use the standard (intrinsic) C++ operator new, which takes one argument. (C allocated space for new nodes using a different library function alloc or malloc with another library function sizeof.)
The argument of new is of the datatype desired and follows the keyword new but is not enclosed in parentheses. new is used in a simple assignment statement with a pointer variable on the left side and allocates space in memory for a new node of the proper type pointed to by the variable named and associated that space with the variable listed on the left side of the assignment statement. In other words, if p were declared to be a pointer variable (pointing to a node), then
p = new node;would actually create the node and store the address of the new node in p.
CLARIFYING NOTE: One declares variables to be of pointer types at the beginning of a program. But these variable only set aside space for "pointers," that is, for the addresses of nodes. Declaring such a space for an address does not itself provide any address or any "structure" at an address. It is like a post office providing space for new street addresses, if and when open territory is subdivided and homes are actually built. Just because the post office has a slot allocated and available for putting a new address, doesn't mean that there actually is open property available and a house built on it.
info next info next info next
_______ _______ _______
----->| | -|- | | -|- | | -|-
|____|__| |____|__| |____|__|
^
P
class node {
public:
int info;
node *next;
};
node *p, *q, *r;
// these variables CAN hold addresses
// of nodes, but they initially do NOT,
// and there are no nodes
// corresponding to them
...
p = new node;
// this allocates a node corresponding
// to p and puts
// the proper address in p
p->info = 2;
// the info section of the node
// that p points to
// gets the value of 2
q = new node;
p->next = q;
q->info = 5;
r = new node;
q->next = r;
r->info = 8;
r->next = NULL;
q = new node;
r->next = q;
q->info = 12;
q->next = NULL;
...
}
It may be helpful to recall what "really" happens in memory when
trying to understand this example. The identifiers p, q, and
r correspond to locations in memory which can store units
of information. Unlike variables we have seen thus far (which have
stored numbers or characters), these "pointer" variables store
addresses of other memory locations.
Remember that all memory is merely a sequentially numbered electronic storage "warehouse" and that the "symbol table" enables a program to make a correspondence between an identifier and the memory location it actually corresponds to. Imagine the following as part of a symbol table:
identifier | memory | location | p | 2156 q | 2157 r | 2158Imagine the following as depicting sections of memory.
location | contents
2156 | ____
2157 | ____
2158 | ____
2159 | ____
2160 | ____
...
3214 | ____
3215 | ____
...
3387 | ____
3388 | ____
3389 | ____
3390 | ____
When the program segment above starts "running," space is allocated
for pointer variables p, q, and r but NO nodes
are created for them to point to. Only when the new
operator is invoked on datatype node is space actually
allocated and the corresponding pointer variable gets a value, which
is the address of the first item in the node. For example,
when
p = new node;is executed, then we can imagine that two memory spaces are allocated for the node p points to, for example locations 3214 and 3215. (Where the "computer" goes to get space is beyond the scope of this course. The point of importance is that the "computer" can get space to allocate and that where the space is, is not under the control of the programmer.) Under this assumption, the memory location that corresponds to p, i.e., 2156, now receives the value of 3214, which is the first part of the new node allocated. When the next line in the code
p-> = 2;is executed, then memory location 3214 receives the value of 2.
The following is a hypothetical final version of the list created by the program code above, with the numbers above the boxes indicating the corresponding memory locations which are also depicted in a "filled-in" version of computer memory corresponding to what was depicted above. (Once again, note that the memory locations corresponding to the nodes were arbitrarily chosen for the sake of this example.)
3214 3215 3389 3390 3387 3388 2159 2160
p _________ _________ _________ _______
----->| | -|->| | -|->| | -|->| | |
| 2 |3389| | 5 |3387| | 8 |2159| | 12 | 0|
|____|____| |____|____| |____|____| |____|__|
This is the revised depiction of computer "memory"
with integer and pointer values inserted.
location | contents
2156 | 3214
2157 | 2159
2158 | 3387
2159 | _12_
2160 | __0_
...
3214 | __2_
3215 | 3389
...
3387 | __8_
3388 | 2159
3389 | __5_
3390 | 3387
Those in their right minds would never write a program merely to make a list of decreasing numbers -- lists are usually used for much more significant (and constantly changing) data than that. But this code does exemplify certain programming methods.
In writing programs of this sort, one needs to think in terms of the following general rule: How do I write the code for a case of size n assuming that I let the code automatically take care of the case of size n-1? A simple algebraic example is that we can write an as a(an-1). Here we assume we can find an-1. We use this assumption to compute an.
This approach is sometimes called "Divide and Conquer."
A similar dynamic is true with other recursive programs, in particular, the one below. We assume we can let the program create a list of size n-1. We use this fact to build a list of size n.
_______ _______ _______
----->| | -|--->| | -|--->| | -|
|____|__| |____|__| |____|__|
#include<iostream.h>
//#include <stdlib.h> for exit,NULL,
// if iostream not included
class node {
public:
int info;
node * next;
};
//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
//* F U N C T I O N G E N L I S T *
// This generates a list. n is a small positive integer
// specifying an upper bound on the number of nodes
// of the list. The information field is
//* given a sequential node number. *
//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
node * genlist(int n, int & numnode)
{ node *temp;
temp = new node;
if (n==1)
temp->next = NULL;
else
temp->next = genlist(n-1,numnode);
temp->info = ++numnode; //numnode is the unique number
//stored in each node after its creation
return(temp);
}
//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
//* F U N C T I O N L I S T P R I N T
// This prints a list by iteratively moving a temporary
// pointer through all the nodes of the list and printing
//* out the value of the info section of each node.
//* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
void listprint(node * list)
{ node * temp;
temp = list;
do
{ cout << temp->info << " " ;
temp = temp->next;}
while (!(temp == NULL));
}
//***************************************************
// MAIN PROGRAM
//***************************************************
int main ()
{
node * list;
int numnode; // numnode is a sequential counter
//used to put data in the info
//section of each node.
numnode = 0;
list = genlist(5,numnode);
listprint(list);
return 0;
}
|
a
/ \
/ \
b c
/ \ / \
d e f g
There are THREE standard ways to "traverse" a binary tree.
The definition is recursive, and is defined for a parent and its
two children.
INORDER left child, Parent, right child: L P R PREORDER P L R POSTORDER L R PNOTE THAT THE CHILDREN ARE ALWAYS LISTED LEFT TO RIGHT.
For the example tree above, the traversals are
INORDER __ __ __ __ __ __ __
PREORDER __ __ __ __ __ __ __
POSTORDER __ __ __ __ __ __ __
AS A QUICK CHECK, note where the ROOT node is listed in each
traversal (it should correspond to the prefix: IN PRE POST
relative to the other nodes).
Algebraic expression example:
|
+
/ \
/ \
/ *
/ \ / \
a b c d
INORDER: a / b + c * d ==> algebraic notation (w/o parentheses)
PREORDER: + / a b * c d ==> functional notation (w/o parentheses)
POSTORDER: a b / c d * + ==> Reverse Polish Notation (RPN)
Preorder would look better if we wrote: +(/(a,b),*(c,d)) or
f(g(a,b),h(c,d))
Postorder is the order used in "RPN" calculators.
| INORDER: __ __ __ __ __
a
/ \
/ \ PREORDER: __ __ __ __ __
b c
/ \
d e POSTORDER: __ __ __ __ __
GENTREE and TREEPRINT.
GENTREE is a function that GENerates a TREE of less than n nodes where n is the input parameter. (If n is a power of 2, then the tree will have n-1 nodes in it. If n is not a power of 2, then the tree will have m-1 nodes in it, where m is the greatest number that is a power of 2 that is less than n.)
GENTREE constructs a tree by building nodes on the left side first (until it reaches the maximum level possible), and then filling in nodes on the right sides.
TREEPRINT is a function that prints out a tree in two dimensions (rotated 90 degrees counter-clockwise). It prints out the nodes with proper spacing to indicate the depth.
In reality, TREEPRINT is printing the tree out in a non-standard order.
(Which? --> ______________________________)
This page is maintained by Dennis C. Smolarski, S.J.
dsmolarski@math.scu.edu
© Copyright 1997, 1998, Dennis C. Smolarski, SJ, All rights reserved.
Last changed: 5 December 1998.