ETA and CONVEX: between -40 and +40 Centigrade
The early ideas about “high-speed computing” at TNO Waalsdorp
The ETA line of supercomputers was officially announced in Parijs on October 15, 1987. Many TNO-institutes were considering (mini-)supercomputing: the KRI, the Groundwater exploration institute, TNO-PML, and last but not least, TNO-FEL at Waalsdorp. Early 1988, a first market survey was made. At the same time, a representative benchmark set was prepared. In addition to synthetic programs, a number of the larger simulation programs of TNO-PML and TNO-FEL that then ran on the VAXes and the CYBER 840A were prepared for being part of the benchmark.
As we were highly interested in advances in compiling techniques and in mini-supercomputing, especially in vector and parallel processing, we included a mathematical kernel which was designed and developed by Professor van der Vorst at the Technical University Delft in the benchmark set. This kernel measured the Mflops speed of a system across a span of increasing vector sizes. The program then could calculate the so-called n½, the vector length at which the half peak rate of the system is reached. The kernel measured these values for over fifty different mathematical calculations, consisting of a mixture of vector-vector and vector-scalar operations.
In the preparation phase, this kernel was also compiled and run on the Cyber 840A using the new and – according to CDC – much faster FTN2/VE compiler. This compiler was particularly designed for the CYBER 990, a system with special vector-based hardware. It turned out that the overall throughput speed of the kernel job on the CYBER 840A was 30% better than the time to complete using the ‘old’ FTN/VE compiler.
Further analysis, however, surprisingly showed a number of remarkable differences for several basic mathematical functions on this scalar system. Determining the maximum value of an array (vector) for instance required only five Compass (assembler) instructions using the old compiler. The new compiler generated about thirty instructions. As the CPU stack (cache) could not hold that many instructions and not all functional units could be fired at once, it became obvious why this kernel section was many times slower. Based on the detailed analysis of this kernel we showed to Control Data, that another ten per cent speed increase was feasible using the old generated code. The suggested improvements were incorporated in a later version of the FTN2/VE compiler.
The preparation of our own benchmark jobs required much more effort than expected. Many of the CPU intensive programs turned out to run without ever being compiled with the ‘optimiser’ compiler option on (FTN4 OPT=2; FTN5 OPT=HIGH). The users explained that they did not use that mode as their program would not work in optimised mode! It required many drops of sweat to remove all the bad programming habits blocking optimisation. First of all, we used the FTN/VE-compiler with opt=high to compile the program. If this resulted in the same outcome as the unoptimised run, we used the new FTN2/VE-compiler to improve the possibilities of the source code for vectorisation. To exclude CDC specific Fortran dialects, the program was then run on the DEC VAX 8350. All program improvements were communicated to the program owners. As a side-effect, a speed improvement of up to 30% was achieved on the current system!
The benchmark programs were then run on the CYBER 205 national supercomputer and the CYBER 995 (NOS/VE system with vector facility) at the Academic Computing Center in Amsterdam (nowadays SURF/SARA), an ETA-10P in Minneapolis, and the double CPU Alliant FX/40 system at DGV-TNO. In October 1988, after extensive contract negotiations, TNO-FEL (Waalsdorp) decided to lease an ETA-10P mini-supercomputer.
ETA-10P architecture: re-use of ‘slow chips’
The ETA-10P, codename “Piper” was designed by Neil Lincoln under the supervision of J.E. Thornton – the technical designer of the CDC 6600 – as technical director of ETA. The ETA-10P was built with the CPU chips that performed ‘slow’ and could not be used in the production of liquid nitrogen cooled ETA-10E processors. The marketing decision to make re-use the slow chips was taken by Carl Ledbetter, who came from IBM. The relatively cheap supercomputer entry level machine could even be exported to the third world. In that case, however, certain powerful hardware instructions had been disabled (e.g. scatter-gather) that could be used for simulation and design of nuclear reactions (e.g. calculation of 10K * 10K determinant and matrix calculations). The CPU chip had 284 pins: only with a microscope, one could see if the automatic soldering process had been successful. The ETA-10P processor sat on an air-cooled high-tech 44 layer board with 20,000 very thin drill holes (1.5 mils). Each CPU contained a shared memory of 4 Mwords 64 bits wide. In addition, the ETA-10P had a common memory of 8 Mwords of 64 bits. The clock speed was 24 nanoseconds. With its ‘pipelining’ technique, the ETA-10P had a theoretical peak speed of 146 Mflops.
An Apollo/Domain Unix workstation was used as operator console to monitor and control the proper functioning of the ETA-10P via a local area connection.
On January 4, 1989, the two system programmers went to Minneapolis to be trained to operate and maintain the ETA Unix System V-operating system. TNO would receive the third ETA system that was equipped with Unix System V. The other ETA systems used the ETA Operating System EOS, a one-to-one copy of the CYBER 200 series of systems operating system VSOS.
During the course period of 2.5 weeks, the average outside temperature was -20 C (-48 F). On one of the days, the outside temperature came down to -40 C, a special experience as the extremely low humidity caused electrostatic discharges when touching a metal object.
In the ETA Systems factory, which was located right next to a railway yard, we could closely watch the manufacturing process of CPU boards. The aforementioned precision drill holes could only be drilled if no train passed. With the aid of seismometers at some distance, the drilling system which was stabilised on thick rubber blocks was switched off in time.
Installation and stopped acceptance procedure of the ETA-10P
At the end of February 1989, the ETA-10P, nicknamed “PIPER 23”, was installed at Waalsdorp. The first acceptance tests indicated some stability problems in the beta version of the Unix operating system. In anticipation of the production release of the UNIX operating system that had yet to be delivered, work progressed on developing operational procedures and running a number of large programs. It turned out that some of the TNO-PML programs (Reagas, Burnex) ran three times faster on the ETA-10P than on a CRAY-XMP.
On 17 April 1989, 8 AM Minneapolis time, a press release announced that Control Data would stop further developments at ETA Systems. The reason was the need for the CDC group to achieve a better financial position, something that ETA Systems could not yet make a contribution to. A personal account of the termination by one of the managers is on the web. The “ETA Yearbook” is also present on the web with a collection of personal memories of ETA staff which were recorded in 1994, five years after the closure.
All acceptance work was immediately discontinued. TNO decided to terminate the contract with Control Data and to look for a replacement system within the same lease contract. In June 1989 benchmarks were performed on the Alliant FX/4 system of TNO-DGV and an Alliant FX/40 system with four CPUs at Alliant Netherlands. The set of benchmark programs were also implemented and run on a Convex C220 system. In addition to the technical evaluation, both suppliers were asked to offer an attractive configuration within the financial scope of the existing lease contract. After many considerations, it was decided to acquire a Convex C220 system.
The CONVEX C220: Texas technology
Because the Convex systems designed by Steven J. Wallach had a very different architecture and a Unix BSD-based operating system which we were not accustomed to, the same system programmers went to courses held at Convex in Dallas, Texas in June/July. It was ‘hot’ on arrival. The weather forecast on the TV indicated: 40 degrees C with the warning that ‘an unprotected stay of more than 12 minutes in the sun, could lead to a heat stroke“. A quite different temperature than a few months before in Minneapolis!
The CONVEX C230 minisupercomputer
The installation and acceptance of the Convex C220 system took place on 1 November 1989. The Convex C220 had two CPUs. Each CPU had a peak rate of 50 MFLOPs (million floating point operations/s). By parallelisation, the CPUs could work together for one job, reducing the throughput time. The system memory was 256 Mbytes. After a year, the Convex was expanded with a third CPU to a CONVEX C230. At the same time, two extra disks were installed. A total of 6 GB of disk space was installed, which could be accessed in ‘stripe mode’via four controllers.
On 7 March 1991, the ConvexOS/Secure operating system was installed. According to an agreement with Convex, TNO was the beta-test site for this operating system. We had to report on our experiences. A number of security issues with this system were reported awaiting corrective patches.
However, a cultural difference emerged between the European and American way of thinking. Security issues identified by us in the network software (telnet, ftp) were not resolved by Convex. Our test generated consecutive usernames with a PC program and submitted those to a telnet and ftp login try on the Convex. The Convex OS/Secure answered to a login try whether a valid username was submitted or not. Similar tries could be made with passwords. This could be repeated thousands of times without the login hacking attempts being logged. Further attempts after a number of login errors were not blocked or delayed. A well-known hole in Unix systems that was not plugged into a ‘secure’ operating system.
However, the Trusted Computer System Evaluation Criteria (TCSEC) C2 security level was only defined for a batch environment. In our European way of thinking, we considered this problem had to be solved ‘in the spirit of’ better security despite that it was not required in a non-formal C2-‘security class environment’.
Incidentally, the ftp security gap has been kept quiet for a long time by us. It was only disclosed to the Internet Engineering Task Force (IETF) group working on a newer ftp standard in the beginning of 1998.
Finally, in September 1992, the safe(r) version of the Unix operating system was formally accepted. In October 1992, TNO presented its experiences with cybersecurity and Convex OS/Secure at the European Convex User Conference in Hamburg.
During the entire lease period which ended in November 1994, the Convex CPUs were highly utilised. The average utilisation over five years (365 days * 24 hours) was 58.5%, the average utilisation during working hours was 63.8%. There were periods when the system was fully utilised for several months in a row. Amongst the main customers were the research groups of TNO-PML.