ZOIS *
Search *
Table of Contents *
Site News
Gratuitous PluggeryThe Author is currently on health sabbatical, but is interested in the odd bit of pro-bono work by the way of theraputic recovery. So if you've any odd bits of work that he can tackle on a non-commercial basis from his base in Cockermouth please let him know. |
At one time ZOIS had to service a contract
in Sheffield. As you may be aware, or have learned from the Home Page,
ZOIS's company headquarters are in Cockermouth. That
particular day our consultant was car less, he decided to go by train.
When he finally arrived in Sheffield, he was a wiser man. He'd been
introduced to the reliability of serial systems.
Let's describe the journey by train from Cockermouth (or rather a nearby town, Maryport) to Sheffield. You cannot go directly, you have to change, firstly in Carlisle, then in Preston and finally in Manchester. The companies that operate the trains reckon that 80% of their trains arrive on time. It suggests therefore that the reliability is 0.8. Let's now calculate the probability that our consultant would be late arriving in Sheffield. If a train is late you miss the connection (a bold statement, but we're trying to demonstrate a point). This would mean that with the train to Carlisle from Maryport there was 0.8 chance that the consultant would catch his train from Carlisle to Preston. Similarly there would now be a 0.8 compounded by a 0.8 (= 0.64) chance that the consultant would catch the train from Preston to Manchester and so it goes on. Based on these figures the probability that the consultant would be on time for his meeting in Sheffield would actually be the product of the probability of success for each individual component or .41 or 41%. The consultant would, more than likely be late.
|
| It's probably to late already
|
The above is not a criticism of the state of rail travel in the north of England, but rather a demonstration of how serial systems reduce reliability and do so quite dramatically. Distributed systems represent a collection of fallible components potentially arranged into serial systems.
Computer components are subject to failure. These failures take two broad forms pernicious and outright. Outright failures occur when a component simply fails fully. Pernicious failures do not fail fully but continue to operate giving erroneous results. A designer's goal is to make components Fail Fast. That is to detect that they have a pernicious failure and to convert it into an outright failure. Outright failures are easily detected and the component can then be repaired or replaced.
How can computer components acting in concert be made to fail fast, particularly when one component may fail in the middle of communication with another? Each component should maintain a state and use the concept of Transactions to ensure that the states are stable. The components therefore do not get out of step with each other over missed or partially successful messages.
Returning once more to our consultant's train journey. How could the reliability be improved. Getting even more hypothetical than before we could assume that the probability of going by a different route has the same probability (Oh! all right, Maryport, Lancaster, Leeds, Sheffield, say). We could promise the Sheffield customer that a consultant would arrive for the meeting, but send two, one via the first route and one via the second. What would the probability be that one would arrive in time? The answer would be 65%. Parallel systems such as the one we have just devised here cause the probability of success to be based on the inverse of the product of the reciprocal of the probabilities of success for the individual components.
Of course if the goal was to have the two consultants both be on time then the chances of that are about 20%.
Transaction Processing Monitors (TPM) attempt to parallelize the tasks
with which they are charged and therefore increase reliability by having
a number of copies of the processing programs and distributing work to
them. The actual processing programs need not necessarily be on the
same physical computer increasing reliability even further (a program on
a computer is a serial system too, all be it only two components!).
Date: 1998/08/01 19:25:15