The standard NOS/BE operating system contained a table in the system memory
in which the PP-loader could find either disk "pointers" to the PP-programs
on disk or pointers to the central memory location. The table contained a number
of "bytes" to point to PP code in case this code was preloaded in extended memory.
The Polish CDC 6400 had no extended memory. Our system neither had extended
memory. The Polish collegue wrote a small number of instructions in the PP loader
code. When a PP (overlay) was loaded, it incremented a counter that was stored
in these unused bytes.
Using a very simple Fortran program which called a PP that copied the table into user space, we were able to subtract the counters from the stored counters during a previous run. In this way it became clear which of the PP programs were loaded very often PP-programma's and which occurred in "bursts". Standard, some part of the very limited central memory (96 KWords * 60 bits) was used to store PP overlays that either were used very often per second or had to be preloaded as these were required to handle error situations in which for instance disks could not be reached. Loading PP code could then be done by a read directly from memory and required no disk I/O actions. By means of the counters, it became clear that the standard NOS/BE preload set of PP code was far from optimal. A manual optimisation lead to a 10% performance increase.
Despite that, the system programmers were not yet satisfied. A load of a dozens of PP-programs per second on the relative (slow) disk could be optimised further. After the initial load of a new complete version of the operating system by means of a deadstart tape, we could determine from the disk addresses in the previously mentioned PP loader table the place of the PP code on the disk. In this way we were able to determine where the roughly 3.5 cylinders (14 disk platters above each other) that comprised the PP-program library on disk started. By reorganising the PP-code on the magnetic deadstart tape, the library was organised in such a way that the less used PP code occupied the disk area up to the end of the initial cylinder. The next cylinder comprised all PP code that was used the most (leaving the memory preloaded PP code out). Then the next cylinders comprised the rest of the PP-library. In this way we enlarged the chance that the disk read heads were already positioned above the right cylinder greatly. Thus cylinder to cylinder movements were no more required and on average only a half rotation of the disk was required to start loading PP code. This enhanced the performance of the system with another 5-10%, depending on the type of load.
Apart from the primairy system PP-library, the system placed "temporary" system changes (EDITLIB(SYSTEM)) in a second library. By changing some code in the operating system (technically: moving the SYS-bit), we moved this library from the original system disk to a second disk. Playing the same cylinder tricks, we were able to replace often used PP programs by a copy of them that was loaded from this second disk. In this way we had a balanced, very optimal operating system that was much faster than the original NOS/BE system. The users gained a lot of performance. We were able to load for more than 95% of the cases a PP-program directly from one of the two system disks without requiring the dozens of milliseconds repositioning those disks. The interactive users gained the most while the performance was then more consistent. It also meant a higher CPU-performance of jobs because the system required less waiting time for input/output.
Despite the "Polish and our own smart code", in the mid of 1981 the users made complaints about the interactive response times on the CYBER 74. Using a program written by Edo Roos Lindgreen at SARA on a Apple IIe microcomputer, every 5 minutes a number of simple interactive commands were issued to the CYBER. The PC measured the time between the carriage return (Enter) and the first character of the response of that key-action. These time values were sent to the Cyber for post-processing. The Apple displayed the last twenty response times on the PC-monitor intended for the operators. They got an almost immediate feedback in case they admitted to many batch jobs to the system, or when other system performance problems occurred.
Using this tool, we measured the behaviour of a temporary extended configuration
of the CYBER with 14 PP's (instead of 10) and an
additional disk controller. It was shown that either the CYBER configuration
had to extended or the CYBER required replacement in order to removed a number
of bottle-necks. Apart from the Apple IIe,
we measured in the system itself as well. Every ten seconds, a PP-program made
a snapshot of the system tables, PP occupation and I/O activities. This PP and
the analysis code originated from the European Center for Mid-Range Weather
Forecasting (ECMWF) and was
called User Performance Measurement (UPM).
The four additional PP's and the additional disk controller decreased the average interactive response time (reaction time to input in an application) from 1.5 to 0.4 second. The average execution of 95% of the interactive commands decreased from ten to 0.5 seconds. The system got "air" again.
The execution of batch jobs was very "priority" dependent. When a user entered for instance twenty jobs in one go, operators most of the time set a greater part of these jobs on priority 0 - not eligible for execution . As it required manual intervention, making jobs eligible for execution was often forgotten until the system became almost empty. This resulted in a number of wasted CPU seconds, either by idle time or rerun time, and late activation of the job resulting is annoyed users. Apart from this, "long" jobs were often moved to the end of the queue. This often based on the personal preferences and experiences of the operators. Sometimes this turned out to be a very good decision, but often the decision was contra-productive. Many complaints were received. In May 1982, a new priority algorithm was introduced for batchjobs. The following integring formula was used to determine the start priority in the queue:
2200B - 2log(CM*(T+ß*IO) + 2log(secondswaiting time) .
The parameter ß was two for normal batch jobs and 0.5 for tape jobs. The queue was aged in a smart way. In this way, each batchjob was started in an honest way without many operator intervention. Those users, who estimated their execution and I/O requirements well, received as bonus a faster moment when their job came to the top of the waiting queue.
As said before, the CYBER 74 had a water cooling system that was supposed to get rid of the heat exchanged by freon cooling circuits in the bays. Water leakage required regular filling of the system with additional water. At a certain moment, one of the CYBER 74 bays indicated temperature problems. The built-in temperature meter indicated a very strange temperature behaviour: a very fast switching between too hot and too cool. Switching the main cooling circuit to the secundary circuit did not solve the problem. It was decided to replace the three-way valve in the cooling circuit of the bay. When the pipes were detached, the cause of the problem became clear. As the Laboratory had iron pipes in its primary cooling circuit, the addition of oxygin-rich water had caused over time much rust on the inside of the pipes. As chips of rust loosened once in a while, these were transported to the smaller cooling pipes in the CYBER system where they became trapped. In this way, our computer was the first system with a "cardiac arrest". The cleaning of the internal cooling system of the CYBER required a couple of days work. Obviously, the total primary circuit required replacement, something that required more time and planning. In order to continue computer services, it was decided to couple the CYBER to an external "blood circulation". The "emergency cooling" was supplied by connecting a fire hose to the cooling input side of the CYBER and at the output a fire hose that deposited the heated water outflux through the window onto the courtyard of the Laboratory. It took three weeks to do all the replacement fitting. It took only a short while before the total area around the Laboratory became part of the extended water production area for the Hague (Haagse Duinwaterleiding).
Photo of a CDC 6600 which was a look-alike of the CYBER 74. The cooling
controls can be seen at the end ofg the bay. The CPU interconnection wires
are very well visible. Ckick photo to enlarge.
(photo courtesy by http://ed-thelen.org/comp-hist/).
In order to optimise the "restore"-process after emergencies as disk crashes, a Fortran program was developed that optimised the use of the magnetic tape units. Both magnetic tape units could run in parallel mode. Also, PP's required during the restore were loaded to memory, causing a largely decreased down time. Permanent files that could not be retrieved from the most recent back-up tapes were restored from the most recent weekly back-up tapes.