[ZOIS] Home Page * Contact ZOIS * Search * Table of Contents * Site News

Gratuitous Pluggery

The Author is currently on health sabbatical, but is interested in the odd bit of pro-bono work by the way of theraputic recovery. So if you've any odd bits of work that he can tackle on a non-commercial basis from his base in Cockermouth please let him know.

Scaling

The property of Scaling is not simply about starting small and growing big it is also about the ability to start big and to grow small. If a system has the property of scalability then the system can be easily and economically sized to fit its workload.

In the modern business world the ability to scale is requisite for several functions. There is information storage and retrieval, information manipulation and the gathering and dissemination of this information amongst the user population. It need not be said that the ability to scale in one area should be matched by a composite ability in other the other areas too. The ideal in all these areas it to allow their growth by the addition of hardware without a practical upper or lower limit.

Scaling in Gathering and Dissemination

Since most computers can easily accommodate a single user the lower limit is fairly safe. The upper limit is not so clear. Traditional multi-user computers (such as mini-computers running multi-tasking operating system like UNIX) have relied on interfaces which are both simple and flexible such as command line interpreters and full duplex input/output. Whilest this means that simple programs can easily be written (using a variety of tools) to build user interfaces such interfaces do not scale readily. The typical number of users that could be accommodated by such computers until recently was in the tens. A more sophisticated approach is required if a large number of users (typically thousands) are to be accommodated.

Certain properties of business users of computers can be relied upon to allow the necessary scaling. Something that Mainframe designers have known about for some considerable time. Individually business users generally have little need for the entirely flexible I/O systems that have already been discussed. Their interactions can be summarized as a series of messages passing to and fro. These messages are characteristic of the particular business function that is being carried out at the time. It is thus possible to arrange hierarchical networks with large numbers of relatively simple I/O messages being funnelled to ward a central computer which is suitably equipped to track and process these messages.

Scaling in Manipulation

Business needs dictate only the most rudimentary transformations in data. They do however require large numbers of these transformations in the largest systems, more than would be easily accommodated by a single program running in a single CPU. Elsewhere on this site, in the page on big computers, it has been seen that groups of CPUs can be aggregated to share memory in a computer. To exploit this increase in CPU power it is necessary to have the data manipulation programs work on several users requirements on several CPUs simultaneously. This can be accomplished by having either several threads of control with one program or several co-operating programs. The multiple CPU with common memory model becomes limiting as more and more CPUs compete to update memory. These computers thus become aggregated into clusters all performing the same functions. Here the several co-operating programs will need to be able to communicate in a shared-nothing environment.

Scaling in Storage

There are two prevalent database models today. The older hierarchical methodology and the more modern relational model expounded by Date amongst others. Both these methods lend themselves to scaling, the hierarchical method more easily than the relational.

In the hierarchical model data is grouped together in a large record structure and is stored in a location based on some simple primary key. There are further secondary indexes cross index on the primary key. It is thus relatively simple to split databases not only across several disks, but several disk controllers and several computers. Some of the largest databases in the world are organized in such a manner.

The relation model on the other hand organizes data into tables which have discrete entity relationships. Entries in one table can form a relationship with entries in other tables. It is of course easy to split and individual table based on a primary index so that it exists across several disks and several machines just as a hierarchical database. It is not, however, easy to maintain and co-ordinate queries, a large number of disperate interactions must take place. Further, when organizing an update several different fragments of several tables may be involved. For this reason a relation database, to scale over several disks and several computers using a share-nothing model must have an efficient way of decomposing the query for parallel execution and a method of locking several widely displaced fragments of the database at once. In all these interactions large amounts of data can flow between the individual computers in the cluster often requiring novel high speed interconnection between them.

Much research has gone into providing the functionality necessary to make the relational model scale, it is widely considered to be the more modern and best approach to database management.

TP Monitors

At the high end of the scale for all of these three activities Transaction Processing Monitors are essential. It can be argued that scalability is the most important function of a TP Monitor. A TP Monitor can provide the necessary infrastructure to allow messages to flow and be tracked between thousands of users and several hundred identical process processing hundreds of business requirements. These processes can be running on large numbers of clustered multi-CPU computers sharing large disk arrays of disk storage.

Date: 1998/04/26 18:28:40


Break Frame * E-mail Webmaster * Author * Copyright