PP code versus CPU code
In the early days of CDC operating systems, much of the operating system was coded on Peripheral Processors (PPs). Probably, some motivating factors were:
- The desire to keep (the very limited) central memory free for user programs.
- The CPU’s inability to do I/O (moreover, many operating system responsibilities have nothing to do with I/O).
- The fact that there were more PPs than CPUs.
Most CDC machines had only a single CPU, but some models like the 6500 had two. The I/O frame either contained seven PPs, ten PPs (initially on the Control Data 6400 at TNO), 14 PPs (we paid for later) or 20 PPs. Note that some PPs were pre-allocated and/or occupied from deadstart on:
- PP0: The PP monitor (MTR)
- PP1: the console driver (DSD)
- one PP per disk channel (1SP, 1SQ)
- the network PP driver (1MR – Internet 4, 1ND – Internet 5)
- JANUS card, plot, print (1IR) services when active
All other PP programs were either loaded based on user request or were regularly started based upon a wake-up timer or a monitored bit change.
For the OS “kernel” itself (though that word was never used), both CPU and PP components were done entirely in assembly language. PPs did not have an RA register to bias addresses, and their CPU Operating System (OS) code always ran with RA=0. As I recall, there was a limited amount of overlaying done with CPU OS code. This is with the exception of Fortran-based overlay programs and additional system ‘utilities’ as TNO’s Single User Editor (SUEDI/SUEDA). But if you like overlays, PPs were the place for you.
There were two basic types of PP programs: those that used the standard “PP Resident” library, and those that did not. PP Resident was a very tightly coded collection of useful utility routines used by nearly all PP programs. A few important PP programs were so space-critical that they were written without the benefit of this library. PP Resident was inexplicably named STL; some people guessed that might stand for SysTem Library.
All PP programs reserved locations 0 – 77B as direct cells. For PP programs that used STL, STL was located in locations 100B – 777B, with the PP program itself starting at 1000B. Other programs simply started at 100B.
Because memory was so limited in the 4K PPs, many PP programs were written using overlays. By convention, overlays were written to load at a multiple of 1000 octal. PP programs were given names with three characters. By convention, the first character of an overlay was a digit representing where the program should load (the address divided by 1000B). Main overlays typically loaded at 1000B, so many programs had names that started with 1. For instance, 1AJ (Advance Job) was called when a command in a job was completed and the next control card needed to be read, parsed, and executed. Child overlays loaded at higher address locations, so their names started with larger digits, such as 4.
Another reason for using a number at the start of a PP name was security. Only PP programs starting with an alphabetical character could be called by users when the access level of the PP was set within the user authorisation bounds.
Important PP programs
Notable PP programs included:
- MTR: System monitor
This was the most important program in the system. This, although some of the monitor functionality, as well as the job scheduler tasks, were assigned to the system monitor also called CPUMTR. MTR was a very tightly coded program. MTR kept track of the time and noted CPU requests for intervention (a user PP call or CPU exchange request). CDC CPUs resembled early PCs in that they did not come with a time-of-day clock, but they did have access to a source of precisely-timed pulses. One hardware channel was reserved for read-only access to a 12-bit counter which constantly incremented. Each time through its loop, MTR would read this special channel and see whether the counter was smaller than last time. If so, the counter must have overflowed. MTR knew how often the counter overflowed and used this information to update a date-and-time data structure in memory. OS requests for the time of day were handled by giving them a copy of this data structure, just as PC BIOSes do today. When the OS was very busy, MTR would sometimes loop so slowly that it wouldn’t notice the counter overflowing. Hence, the software clock would lose time.
- DSD: Operator console
The “Dynamic System Display” was the routine that ran the operator’s console. It was heavily overlayed. For a description of using DSD, see Console Commands.
- 1SP/1SQ: Disk I/O
The above PP routines were unusual in that they hogged a PP; once loaded, they stayed running in that PP forever. Most PP programs were transient. They were loaded into a PP, did their work in typically a fraction of a second, and then the PP was marked as available for loading another PP program. PP resident (STL) was responsible for the loading. There was one important PP routine that was a hybrid: 1SP (later 1SQ), the Stack Processor. 1SP was responsible for the actual disk I/O. It processed a list of disk I/O requests that were organised in priority lists, the so-called stacks. The stack processor tried to optimize head movements and sector selections to obtain the highest overall throughput and to minimise waiting times. Responsive disk I/O was very important to system performance, of course, so the system made sure that a copy of 1SP was always loaded into at least one PP, even if there were no outstanding disk I/O requests. In fact, since there were multiple disk controllers and disk units, the system could do true simultaneous disk I/O, and therefore tried to keep multiple copies of 1SP loaded to allow this to happen. The system dynamically adjusted the number of copies of 1SP/1SQ in PPs. If there was a lot of disk I/O on multiple units for a while, more copies of 1SP would be loaded. However, you wouldn’t want to tie up too many PPs with idle copies of 1SP, so the number would be allowed to dwindle when the I/O load decreased.
Most PP routines were stored on disk, but the master copy of 1SP was kept in central memory as well as the code of some other PPs and DSD overlays. That code was required to reside in the expensive main memory, e.g. because the code was required to handle disk error situations or monitored tasks.
CDC operating systems implemented an unusual system call mechanism. System requests – referred to as PP requests even if no PP program was involved – were made by placing a specially-formatted word at address 1 of a program’s field length (i.e., RA+1). This location was scanned periodically by MTR (or CPUMTR). When the system noticed that a job’s RA+1 was non-zero, it would zero the location and start servicing the request. By convention, applications would loop, waiting for RA+1 to zero both before and after issuing a request. It certainly was necessary for an application to ensure that RA+1 was zero before issuing a request, lest a previously-issued but as yet unserviced request be overwritten. But this could have been done by consistently checking either before or after each request.
A PP could also request the CPU monitor to reserve a resource. As this was done asynchronously from the CPU execution, the request had to be handled in such a way that it could not be interrupted. CPU monitor (CPUMTR) would test whether the requested resource was available. If not, the PP would wait a couple of milliseconds before reissuing the request. If the resource was available, the CPUMTR would lock the resource and inform the PP. To lock the resource, CPUMTR Program Mode coding made use of the fact that the CPU is interrupted at word boundaries only. Thus a read, test, set and rewrite were all performed in executing one word to prevent an XJ-interrupt of the process.
* B2 contains the address of the word with the bit to be tested and set (if free) * X6 contains the bit mask with the bit to be tested & set.
+ SA1 B2 Read the word with the resource to obtain; * the + indicates force-upper for Compass (the assembler) .. start at the word boundary
BX3 X1*X6 Extract the resource bit’s current state BX7 X1+X6 AND the bit and move the result to X7
SA7 A1 Rewrite the interlock word; if a resource was set, nothing changed.
If the resource was free, it is locked now.
CPUMTR can now test X3. If X3 is non-zero, the resource is occupied which needs to be signalled to the PP in order to reissue the request later. If X3 is zero, the PP has acquired and locked the resource.
When the CPU was in Monitor Mode (a machine with CEJ/MEJ hardware), the monitor could not be interrupted by subsequent exchange requests. Thus, no semaphores were necessary to change the status of resources (e.g. lock-bits in tables). Note that the PP had to verify whether its request was honoured or ignored. On dual-CPU systems, the hardware ensured that only one CPU could be in Monitor Mode at a time.
In the early days, a significant amount of the system’s CPU time (probably 5-10%) was spent by applications looping, waiting for the system to notice their RA+1 requests. An optional instruction, the Central Processor Exchange Jump, was available to allow an application to transfer control to the OS and have it notice the request. This XJ instruction was a kind of hardware-supported software interrupt.
By the way, channel requests were issued by PP MTR (PP0). A PP program wanting to request a channel placed its request in its output register, a special reserved word in CPU memory. PP MTR scanned all PP output registers at regularly. In case of a request, it granted channel access when the resource was free. Each PP program, after obtaining access to a channel was responsible for returning the resource in time. The only exception was a deliberate hang by a PP when it determined some inconsistency in the system. By going into a hung state, the PP could avoid further corruption of the system.
(with special thanks to Mark Riordan, MSU who provided the basis for this page)