Previous   Next

Compound Data Structures

Arrays, linked lists, and strings all provide simple ways to structure data sequentially. They provide a first level of abstraction that we can use to group objects in ways amenable to processing the objects efficiently. Having settled on these abstractions, we can use them in a hierarchical fashion to build up more complex structures. We can contemplate arrays of arrays, arrays of lists, arrays of strings, and so forth. In this section, we consider examples of such structures.

In the same way that one-dimensional arrays correspond to vectors, two-dimensional arrays, with two indices, correspond to matrices, and are widely used in mathematical computations. For example, we would use the Java code double[][] c = new double[N][N]; to declare that we are going to use a two-dimensional array c to represent an N-by-N matrix. Then we might use the following code to set all of c's entries to 0.0:

for (i = 0; i < N; i++) for (j = 0; j < N; j++) c[i][j] = 0.0; 


With similar representations for two other matrices, we then might use the following code to multiply a and b, leaving the result in c.

for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (k = 0; k < N; k++) c[i][j] += a[i][k]*b[k][j]; 


We frequently encounter mathematical computations like this one that are naturally expressed in terms of multidimensional arrays.

Beyond mathematical apps, a familiar way to structure information is to use a table of numbers organized into rows and columns. A table of students' grades in a course might have one row for each student and one column for each assignment. Such a table would be represented as a two-dimensional array with one index for the row and one for the column. If we were to have 100 students and 10 assignments, we would write grades[100][10] to declare the array, and then refer to the ith student's grade on the jth assignment as grade[i][j]. To compute the average grade on an assignment, we sum together the elements in a column and divide by the number of rows; to compute a particular student's average grade in the course, we sum together the elements in a row and divide by the number of columns, and so forth. Two-dimensional arrays are widely used in apps of this type. On a computer, it is often convenient and straightforward to use more than two dimensions. For example, an instructor might use a third index to keep student-grade tables for a sequence of years.

Two-dimensional arrays are a notational convenience, as the numbers are ultimately stored in the computer memory, which is essentially a one-dimensional array. In many coding environments, two-dimensional arrays are stored in row-major order in a one-dimensional array: In an array a[M][N], the first N positions would be occupied by the first row (elements a[0][0] through a[0][N-1]), the second N positions by the second row (elements a[1][0] through a[1][N-1]), and so forth. With row-major order, the final line in the matrix-multiplication code in the previous paragraph is precisely equivalent to

c[N*i+j] = a[N*i+k]*b[N*k+j] 


The same scheme generalizes to provide a facility for arrays with more dimensions. In Java, multidimensional arrays may be implemented in a more general manner: we can define them to be compound data structures (arrays of arrays). This provides the flexibility, for example, to have an array of arrays that differ in size.

The basic facilities of Java make it easy to create and manipulate compound structures. For example, an array of strings in Java is an array of references to strings. When the strings are represented as arrays of characters, this representation is essentially an array of arrays that differ in size. By manipulating the references, we get the effect of manipulating the strings. For example, as illustrated in Screenshot, we then can get the effect of rearranging strings simply by rearranging the references in the array. To accomplish this sort in Java, we can use the Arrays.sort method in the Java utilities package or, as described in , any one of numerous algorithms from Part II of this tutorial.

Screenshot String sort

When processing strings, we work with references into a system buffer that contains the string objects (top). The string representation is hidden in Java, so here we show a stylized representation that consists of the length followed by a character array that contains the strings' characters. To manipulate the strings, programs move their references but do not change the string characters. For example, a sort method would rearrange the references such that accessing them in order would give the strings in alphabetical (lexicographic) order.

Java graphics 03fig14.gif


We have already encountered another use of arrays of strings: the args array that is used to pass parameter strings to main in Java programs. The system creates a string for each string in the command line typed by the user and passes to main an array of references to those strings. We use conversion methods to calculate numbers corresponding to some parameters; we use other parameters as strings, directly.

We can also build compound data structures exclusively with links. Screenshot shows an example of a multilist, where nodes have multiple link fields and belong to independently maintained linked lists. In algorithm design, we often use more than one link to build up complex data structures, but in such a way that they are used to allow us to process them efficiently. For example, a doubly linked list is a multilist that satisfies the constraint that x.l.r and x.r.l are both equal to x. We shall examine a much more important data structure with two links per node in .

Screenshot A multilist

We can link together nodes with two link fields in two independent lists, one using one link field, the other using the other link field. Here, the right link field links together nodes in one order (for example, this order could be the order in which the nodes were created) and the left link field links together nodes in a different order (for example, in this case, sorted order, perhaps the result of insertion sort using the left link field only). Following right links from a, we visit the nodes in the order created; following left links from b, we visit the nodes in sorted order.

Java graphics 03fig15.gif


If a multidimensional matrix is sparse (relatively few of the entries are nonzero), then we might use a multilist rather than a multidimensional array to represent it. We could use one node for each value in the matrix and one link for each dimension, with the link pointing to the next item in that dimension. This arrangement reduces the storage required from the product of the maximum indices in the dimensions to be proportional to the number of nonzero entries, but increases the time required for many algorithms, because they have to traverse links to access individual elements.

To see more examples of compound data structures and to highlight the distinction between indexed and linked data structures, we next consider data structures for representing graphs. A graph is a fundamental combinatorial object that is defined simply as a set of objects (called vertices) and a set of connections among the vertices (called edges). We have already encountered graphs, in the connectivity problem of .

We assume that a graph with V vertices and E edges is defined by a set of E pairs of integers between 0 and V-1. That is, we assume that the vertices are labeled with the integers 0, 1, ..., V-1, and that the edges are specified as pairs of vertices. As in we take the pair i-j as defining a connection between i and j and thus having the same meaning as the pair j-i. Graphs that comprise such edges are called undirected graphs. We shall consider other types of graphs in Part 7.

One straightforward method for representing a graph is to use a two-dimensional array, called an adjacency matrix. With an adjacency matrix, we can determine immediately whether or not there is an edge from vertex i to vertex j, just by checking whether the entry at row i and column j of the matrix is true. For the undirected graphs that we are considering, if the entry in row i and column j is true, then so must be the entry in row j and column i—the matrix is symmetric. Screenshot shows an example of an adjacency matrix for an undirected graph; Program 3.16 shows how we can create an adjacency matrix, given a sequence of edges as input.

Screenshot Graph with adjacency-matrix representation

A graph is a set of vertices and a set of edges connecting the vertices. For simplicity, we assign indices (nonnegative integers, consecutively, starting at 0) to the vertices. An adjacency matrix is a two-dimensional array of boolean values where we represent a graph by putting in row i and column j true if there is an edge from vertex i to vertex j and false otherwise (in the diagram, we represent true with 1 and false with 0). The array is symmetric about the diagonal. By convention, we assign true values on the diagonal (each vertex is connected to itself). For example, the sixth row (and the sixth column) says that vertex 6 is connected to vertices 0, 4, and 6.

Java graphics 03fig16.gif


Another straightforward method for representing a graph is to use an array of linked lists, called adjacency lists. We keep a linked list for each vertex, with a node for each vertex connected to that vertex. For the undirected graphs that we are considering, if there is a node for j in i's list, then there must be a node for i in j's list. Screenshot shows an example of the adjacency-lists representation of an undirected graph; Program 3.17 shows how we can create an adjacency-lists representation of a graph, given a sequence of edges as input.

Screenshot Adjacency-lists representation of a graph

This representation of the graph in Screenshot uses an array of lists. The space required is proportional to the number of nodes plus the number of edges. To find the indices of the vertices connected to a given vertex i, we look at the ith position in an array, which contains a reference to a linked list containing one node for each vertex connected to i.

Java graphics 03fig17.gif


Adjacency-matrix graph representation

This program reads a set of edges that define an undirected graph and builds an adjacency-matrix representation for the graph, settinga[i][j] and a[j][i] to true if there is an edge from i to j or j to i in the graph, or to false if there is no such edge.

class AdjacencyMatrix { public static void main(String[] args) { int V = Integer.parseInt(args[0]); int E = Integer.parseInt(args[1]); boolean adj[][] = new boolean[V][V]; for (int i = 0; i < V; i++) for (int j = 0; j < V; j++) adj[i][j] = false; for (int i = 0; i < V; i++) adj[i][i] = true; for (In.init(); !In.empty() ;) { int i = In.getInt(), j = In.getInt(); adj[i][j] = true; adj[j][i] = true; } } } 


Both graph representations are arrays of simpler data structures— one for each vertex describing the edges incident on that vertex. For an adjacency matrix, the simpler data structure is implemented as an indexed array; for an adjacency list, it is implemented as a linked list.

Thus, we face straightforward space tradeoffs when we represent a graph. The adjacency matrix uses space proportional to V 2; the adjacency lists use space proportional to V + E. If there are few edges (such a graph is said to be sparse), then the adjacency-lists representation uses far less space; if most pairs of vertices are connected by edges (such a graph is said to be dense), the adjacency-matrix representation might be preferable, because it involves no links. Some algorithms will be more efficient with the adjacency-matrix representation, because it allows the question "is there an edge between vertex i and vertex j?" to be answered in constant time; other algorithms will be more efficient with the adjacency-lists representation, because it allows us to process all the edges in a graph in time proportional to V + E, rather than to V2. We see a specific example of this tradeoff in .

Both the adjacency-matrix and the adjacency-lists graph representations can be extended straightforwardly to handle other types of graphs (see, for example, Exercise 3.74). They serve as the basis for most of the graph-processing algorithms that we shall consider in Part 7.

To conclude this chapter, we consider an example that shows the use of compound data structures to provide an efficient solution to the simple geometric problem that we considered in . Given d, we want to know how many pairs from a set of N points in the unit square can be connected by a straight line of length less than d. Program 3.18 uses a two-dimensional array of linked lists to improve the running time of Program 3.7 by a factor of about 1/d2 when N is sufficiently large. It divides the unit square up into a grid of equal-sized smaller squares. Then, for each square, it builds a linked list of all the points that fall into that square. The two-dimensional array provides the capability to access immediately the set of points close to a given point; the linked lists provide the flexibility to store the points where they may fall without our having to know ahead of time how many points fall into each grid square.

Adjacency-lists graph representation

This program reads a set of edges that define a graph and builds an adjacency-matrix representation for the graph. An adjacency list for a graph is an array of lists, one for each vertex, where the jth list contains a linked list of the nodes connected to the jth vertex.

class AdjacencyLists { static class Node { int v; Node next; Node (int v, Node t) { this.v = v; next = t; } } public static void main(String[] args) { int V = Integer.parseInt(args[0]); int E = Integer.parseInt(args[1]); Node adj[] = new Node[V]; for (int i = 0; i < V; i++) adj[i] = null; for (In.init(); !In.empty() ;) { int i = In.getInt(), j = In.getInt(); adj[j] = new Node(i, adj[j]); adj[i] = new Node(j, adj[i]); } } } 


The space used by Program 3.18 is proportional to 1/d2 + N, but the running time is O(d2N2), which is a substantial improvement over the brute-force algorithm of Program 3.7 for small d. For example, with N = 106 and d = 0.001, we can solve the problem in time and space that is effectively linear, whereas the brute-force algorithm would require a prohibitive amount of time. We can use this data structure as the basis for solving many other geometric problems, as well. For example, combined with a union-find algorithm from , it gives a near-linear algorithm for determining whether a set of N random points in the plane can be connected together with lines of length d—a fundamental problem of interest in networking and circuit design.

A two-dimensional array of lists

This program illustrates the effectiveness of proper data-structure choice, for the geometric computation of Program 3.7. It divides the unit square into a grid and maintains a two-dimensional array of linked lists, with one list corresponding to each grid square. The grid is chosen to be sufficiently fine that all points within distance d of any given point are either in the same grid square or an adjacent one.

class ClosePoints { static class Node { Point p; Node next; Node(Point x, Node t){p=x;next = t; } } static int G, cnt = 0; static double d; static Node[][] g; static void gridInsert(Point p) { int X = (int)(p.x*G)+1, Y = (int)(p.y*G)+1; Node s, t = new Node(p, g[X][Y]); for (int i = X-1; i <= X+1; i++) for (int j = Y-1; j <= Y+1; j++) for (s = g[i][j]; s != null; s = s.next) if (s.p.distance(t.p) < d) cnt++; g[X][Y] = t; } public static void main(String[] args) { int i, N = Integer.parseInt(args[0]); d = Double.parseDouble(args[1]); G = (int) (1.0/d); g = new Node[G+2][G+2]; for (i = 0; i < G+2; i++) for (int j = 0; j < G+2; j++) g[i][j] = null; for (i = 0; i < N; i++) gridInsert(new Point()); Out.print(cnt + " pairs "); Out.println("closer than " + d); } } 


As suggested by the examples that we have seen in this section, there is no end to the level of complexity that we can build up from the basic abstract constructs that we can use to structure data of differing types into objects and sequence the objects into compound objects, either implicitly or with explicit links. These examples still leave us one step away from full generality in structuring data, as we shall see in . Before taking that step, however, we shall consider the important abstract data structures that we can build with linked lists and arrays—basic tools that will help us in developing the next level of generality.

Exercises

Extend Program 3.18 (and Program 3.2) to three dimensions so that it finds the number of pairs of N randomly generated points in the unit cube that can be connected by a straight line of length less than d.

Write a program that reads strings from standard input and prints them out in sorted order, taking the number of strings to be sorted from the command line.

Write a program to fill in a two-dimensional array of boolean values by setting a[i][j] to true if the greatest common divisor of i and j is 1, and to false otherwise.

Use Program 3.18 in conjunction with Program 1.4 to develop an efficient program that can determine whether a set of N points can be connected with edges of length less than d.

Write a program to convert a sparse matrix from a two-dimensional array to a multilist with nodes for only nonzero values.

Java graphics roundbullet.gif 3.71 Implement matrix multiplication for matrices represented with multilists.

Java graphics icon01.gif 3.72 Show the adjacency matrix that is built by Program 3.16 given the input pairs 0-2, 1-4, 2-5, 3-6, 0-4, 6-0, and 1-3.

Java graphics icon01.gif 3.73 Show the adjacency lists that are built by Program 3.17 given the input pairs 0-2, 1-4, 2-5, 3-6, 0-4, 6-0, and 1-3.

ScreenshotA directed graph is one where vertex connections have orientations: edges go from one vertex to another. Do Exercises 3.72 and 3.73 under the assumption that the input pairs represent a directed graph, with i-j signifying that there is an edge from i to j. Also, draw the graph, using arrows to indicate edge orientations.

ScreenshotWrite a method that uses the adjacency matrix of a graph to calculate, given vertices a and b, the number of vertices c with the property that there is an edge from a to c and from c to b.

ScreenshotAnswer Exercise 3.75, but use adjacency lists.


Previous   Next
Comments