Ideas about Automatic Translating Systems (1960)
In 1947, the Physics Laboratory already started research in digital technologies, technologies that were still in their infancy at the time. In 1951 a small calculator was made.
Contemplating digitalisation, Ijsbrand Boxma, the Director of the Laboratory, explored the theoretical possibilities of automatic translation of texts from other languages in the period 1957 – 1960. (ed. his lectures concentrate on the translation processes and take for granted that literature could be made “machine-readable” by typing on a “teletype” or “telex machine”).
The fact that it is impossible to read all scientific literature, which appears in many languages, strongly hinders the progress of science. This is, of course, particularly evident for Russian, Japanese and other non-Western European languages. But in France, England and America, literature in English, French or German is not always directly accessible either.
Below follows an excerpt of his lectures about this topic in the years 1958 – 1960:
The construction of translating machines is based on the knowledge of physical sciences as well as on linguistics, or better said, it brings us to the intersection of mathematics, physics and technology on the one hand and linguistics on the other. One will have to get to know the properties of the language and try to describe them in such a way that they can be physically nailed down. For the physical realisation of a translation machine, one currently [1960!] only thinks of modern information processing equipment. The definition of the language properties will then have to have a mathematical character and have to be entered into the machine in the form of programming rules and constants.
Around 1947, the first suggestion of trying machine translation with an electronic calculation machine was made by Warren Weaver of the Rockefeller Foundation in a letter to Professor Norbert Wiener of the Massachusetts Institute of Technology. He wrote, among other things: ‘When I look at an article in Russian, I say: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.‘
However, deciphering a code, leaving the language unchanged, is a different process than translating. The whole system of expressing thoughts changes. We see a gradual transition from the study of calculation machine experts to the study of linguists. It has been demonstrated on many electronic calculation machines in England, the United States of America, and Russia, that, with a very limited dictionary machine, translation of appropriate sentences is possible. Boxma provides a brief description of why the electronic calculation machines of those days are more or less suitable for performing translations and the problems encountered when setting up an automatic translation program. Boxma then discusses the state-of-the-art components: transistor circuits, storage media such as punched tape, core memories and coded disks: “In many cases, the numbers will be entered into the calculation machine using a punched tape, where a hole is a one. In the memory of the machine, the entered as well as the ones and zeros to be remembered must be stored, for which many systems are in use depending on the requirements made.” and “Nowadays one often uses magnetic memory elements. One piece of equipment is a magnetic drum which is covered with a layer of nickel. While the drum rotates fast, a small surface element of the surface is magnetised by a so-called write head. By applying very short impulses, this magnetised surface can be kept very small so that a large number of numbers can be written along its perimeter.”
Some of the state-of-the-art techniques at the time are shown in the accompanying photos on this web page.
A photographic plate or film can also be used as a permanent memory where the ones and zeros are represented by translucent and opaque blocks. Retrieving information can take a long time when the reading system needs to be repositioned. An advantage is the large capacity: with sophisticated photographic techniques 10,000 bits per mm2 can be captured! The table shows that increasing speed [ed. at the time] is accompanied by an increase in volume.
|Type of memory||Time in seconds||Volume in cm3 per bit|
|Magnetic core memory||10-5||10-5||2|
|Magnetic drum||10-5||10-5 – 10-2||0.2|
|Photo disk||—||10-4 – 10-1||0.04|
Technical difficulties in using the calculation machine as a translator
In order to use an electronic calculation machine as a translator, you may need to enter words instead of numbers. This is very easy by replacing each character with a number under 32. These numbers can then be stored in the machine in binary form. In fact, something similar is already happening with the telex. In this way, a 10-character word will be entered into the machine as a number with 10 x 5 = 50 ones and zeros. For example, if encoding were used where a = 1 , b = 2, etc., the word ‘table’ would be encoded as 20, 1, 2, 12, 5 and thus entered into the machine as 10100 00001 00010 01100 00101.
In Russia, a Chinese text is entered in a similar manner using the telegraph code. Each Chinese character is numbered for the purpose of telegraphic transmission.
It is thus possible to enter text in the form of numbers into the machine, while words can also be stored in the machine’s memory. If the corresponding word in the other language is written behind each word in the memory, of course also in the form of a number, then a dictionary is created. When a word to be translated enters the machine, for example on a punched tape, this word -or if you will this number- can subsequently be subtracted from the words in the memory. If the result is zero, the correct word has been found. The corresponding word in the other language can then be fed to the output of the machine, after which it can be typed with the aid of, for example, a telex machine.
Now, it is useful to point out two technical problems that arise when we want to store a dictionary in memory. One problem concerns the volume required, the other the speed.
The volume of the dictionary memory
Suppose you want to record 60,000 words and also 60,000 words in another language. If each word has an average length of 6 characters, each word then requires 30 bits. A total of 3,600,000 bits must then be memorised. In reality, this number will be considerably larger due to double meanings and other causes, thus a dictionary size of 4.167 bits must be considered. Because of this large number, attempts are made in various ways to limit the dictionary size.
If a word enters the machine with the punched tape via the input device, this word must be compared with the words from the memory. This can be done in different ways:
- If the incoming ‘word number’ is successively subtracted from the N ‘word numbers’ in memory, then on average N/2 subtractions will have to be performed before the result is zero, or in other words the right word is found. Suppose 0.001 seconds is needed to read the word and perform the subtraction, then for an N=60,000, the look-up of each word to be translated takes an average of 30 seconds.
- A faster result is achieved by placing the words in an order that is determined by the frequency distribution of the words in the relevant language, or possibly in that of the relevant scientific field. If we assume that in a given text a word of rank r occurs k/r times (k is assumed constant according to Zipf’s law), then if the dictionary contains N words, the total number of words in this text is equal to the sum of k/r for r=1 .. N. This number of words must therefore be looked up and is approximately equal to k.
The word r occurs k/r times and needs r lookup procedures. So in total, this is a number of k lookups for all words r; and for all N words k, N procedures. On average, therefore, N/ln(N) procedures are required per word. Using our example figures, the resulting value for N = 60,000 becomes 5,500 which takes a time of 5.5 seconds.
- A considerably faster method is obtained by placing the ‘word numbers’ in an ascending number sequence and making use of the possibility that the machine can determine its next operation from the previous result. If we subtract the incoming word from the middle word from the memory, the result can be positive or negative. The searched word will then be located in the first or second half of the memory. The process is repeated for the middle of that half, fixing the correct quarter. Continuing like this, 16 subtractions indicate the location to the nearest 1:65,536. This binary search method is therefore sufficient for 60,000 words. In this way, the word is found in 16 * 0.001 sec = 0.016 seconds. The time required to reach the correct position in the memory is now undoubtedly greater than the time required for the subtractions. The result will therefore be less favourable than suggested above.
Linguistic problems when designing a translation engine
Translation is really nothing more than replacing one language with another with the intention of expressing the same ideas. A dictionary forms the basis both when the translation is done by man and when it is performed by a machine. With a word-for-word translation, one almost never fully achieves the translation objective.
Among other things, many words have more than one meaning, which corresponds to different words in the other language. However, there is another reason why word-for-word translation does not suffice; The grammar also contains a considerable part of the information to be conveyed, which is expressed, among other things, in a more or less strictly prescribed sentence structure and in the inflections. It is expressed as follows: the words indicate what is being spoken of, the grammar what is said about it.
Therefore, in addition to a dictionary, it is necessary to enter grammatical rules for the translation into the translation engine. Like the dictionary, these rules must be stored in memory and must be arranged in such a logical way that they can be followed by a machine. This problem is certainly not simple and requires a careful study of the structure of the language. It should be noted that the accuracy and completeness of the grammatical rules depend on the designer’s ability to derive generalities from a large number of language details.
Some solution directions for machine translations of texts of reasonable size and complexity are indicated below:
On the one hand, it is necessary to recognise the inflexions and conjugations in the text in order to draw conclusions for the translation. On the other hand, it is necessary to correctly inflect the words of the translation. These two sides of the inflexion problem only partially cover each other. For example, in some languages, inflexions can be replaced by words, such as prepositions or articles.
Example 1: German ‘der’ can mean ‘of the’, Example 2: Norwegian kake => cookie; kaken => the cookie; kaker => cookies; kakene => the cookies.
You may wonder whether all forms of inflexion should be present in memory. When using electronic calculation machines for translation, one has to minimise memory use, no matter how large these memories appear to be. Therefore, many attempts have been made to remove prefixes and suffixes. For example, the machine can try to see if the last character(s) of the word denotes a plural form, e.g. an ‘s’ in English [ed. and ‘en’ in Dutch]. After stripping this suffix, the dictionary can be checked for the remainder. This remainder then is marked with an indication of the word type, while the machine finds in a suffix dictionary how to handle the translation of the identified suffix. To have an idea about the gain in volume and time from this split, let’s assume that the original dictionary of N words has been replaced by n main dictionaries, each with Ni words while each main dictionary contains mi suffixes (i = 1, 2, 3, – – – n).
For a dictionary with N words, the number of lookups P is equal to the smallest number above 2log N when using the binary search method. An approximation for P is then given in the split dictionary by P= 2log (sum of Ni for i=1 .. N times the sum of mi for i=1 .. n). This formula is not really valid for several reasons.  provides an example of a 60,000-word dictionary split into four main dictionaries, each with a different number of translation results (4,000 x 5; 3,000 x 10; 2,000 * 4, and 2,000 * 1). The sum over Ni is 11,000; the sum over mi is 20, and the sum Ni*mi is 60,000. The gain in the number of procedures is a factor of two as compared to 16 (2log 50,000). This gain is disappointing if you consider the search time. The memory footprint is smaller. However, it seems likely that a real translator’s memory will become so large that any inflexions can be included in the dictionary.
The problem of sentence structure also has two sides. In the first place, the syntax of the entered text can be very significant, for example in determining the correct meaning of words with more than one meaning. Secondly, the sentence structure of the translation can be of great significance for a proper understanding.
It is most obvious to have the machine perform an analysis of the sentence structure of the text. For this, each word in the memory must have an indication, such as ‘article’ or ‘verb’, as far as this is possible. For those words for which this is not possible, the combination with the known words can usually provide some information. After that, the sentence structure of the translation will have to be constructed using the construction rules of the second language.
Words with more than one meaning
Finding the correct translation of the words for which more than one translation is possible is the most difficult problem for the designer of a translation machine. This problem has already been addressed in many ways by several researchers. It is impossible to give more than a first impression of the problem at hand. The number of words that have more than one meaning is very large, even if one is satisfied that synonyms are not used. Presumably, each word has on average two meanings. This number decreases considerably if the translation engine is only intended for a certain field, such as mathematics.
It makes sense to make a divide between multiple grammatical and non-grammatical meanings. Distinguishing between the meanings is easy for the first group (weather – the weather) if the word type has already been determined. If the word type has not been determined and also for the non-grammatical double meanings (e.g. light), the distinction must become apparent from the remainder of the text. The part of speech can sometimes be determined from characteristic connections between parts of the text (e.g. adjective – noun), or even because there is not yet a verb in the sentence and the word in question is most likely this verb. In some cases, the entire sentence or even more sentences will have to be searched for an indication to determine the meaning of the word. Such an indication may belong to a particular field. If these are indicated with numbers, it can be checked, for example, which number occurs most often in the text. There is a good chance that the translation of the word in question must also bear this number.
Words can be divided into two groups. One group provides the essential content of the text. They are parts of the text such as verbs, nouns, and adjectives. They usually have a few different meanings. The other group of words is sometimes referred to as ‘glue words’ because they glue the words from the first group together, e.g. prepositions. They usually have many meanings and are much more common than the words in the first group (for example, the English word ‘to’ can mean to, at, of, towards, against, in, up to, for, at, and in comparison). It should be noted that even when compiling a dictionary, it should be taken into account that the words that appear most frequently are not always the most important. Omitting an infrequently occurring word may have more serious consequences than omitting a glue word.
Words which, in combination with another word, have a special translation (e.g. in idiomatic expressions) can also be counted among those words with more than one meaning. For example, the combination ‘private person’ can be translated as ‘citizen’. The word ‘private’ will therefore have to contain a clue in the dictionary to check whether it is followed by ‘person’.
Proper names, e.g. Herr Schwartz, present a peculiar difficulty in this group of words with more than one meaning. It cannot be avoided that the machine tries to find the proper noun in its memory. If it succeeds, the translation (‘mr. Black’) will be issued. If it does not exist in memory, it is carried to the exit without further ado, like any word for which there is no translation. A capital letter can sometimes be an indication of a proper name, but of course, that does not always provide certainty.
Finally, there are the words that change meaning when translated. For example, the word “vier” (four) in the sentence “Het woord twee heeft vier tekens” (The word two has four characters) should not be left untouched when translated into English. [ed. A test in July 2021 shows that Google Translate does not interpret the text to be translated substantively and therefore fails on this point]
Some considerations about translation engine capabilities
The current situation 
The largest interest in translation machines comes from mathematicians, physicists and engineers, followed by practitioners of other sciences and military agencies. The explanation lies on the one hand in the fact that they are most familiar with the possibilities offered by general calculation machines [ed. not yet computers at that time], and on the other hand feel the most need to take cognizance of scientific publications in other languages. Related to this is that the dictionary lists, which are designed to use an electronic calculation machine as a translation machine, are based on usage in these special fields. As a result, the number of words decreases considerably, while the multiple meanings in particular decrease strongly. A not entirely correct sentence structure is accepted if the content of the paper can be understood. A good translator, on the other hand, will probably notice when he sees the result that he can do it faster and better himself. However, the lack of good translators certainly gives translation machines a raison d’être.
A machine designed for translation, not a programmed calculation machine, will be able to be fully adapted to its task and the results such a machine will produce may therefore be satisfactory. The translator, which is currently under construction in the United States of America for the Rome Air Development Center, features a 30-million-bit photographic memory, capturing approximately half a million words. [ed. see AN/GSQ-16]. The system is designed for translations from Russian into English. As far as we know, this will be the first real translation machine.
Both American and Russian literature show that research is underway into the possibility of constructing a translation machine that is not only capable of translating one language into another, but that can process more languages in both directions. With N languages you then have N * (N-1) translation options. However, if one treats a language as a central language, one can direct all translations through this language, so one only needs to have 2 * (N-1) translation options. The (N-2) * (N-1) possibilities, which are not covered here, therefore take place in two steps, which of course have drawbacks. In Russia, this system is being considered for the languages German, Chinese, Japanese and Russian, with Russian as the central language, so that most translation work can still be done in one step. It should be noted that a system of translation rules only applies to two languages in one direction.
Opponents of the use of English or Russian as an intermediate language point out the strange consequence of also having to carry out a translation from Chinese into Japanese via this intermediate language. Then the neutral Chinese words would first have to be transferred in singular or plural forms and articles would have to be added to them. Then the words would have to be returned to their old state. In this context, one therefore also thinks of an approach per language area.
In America it has been proposed to use a logically constructed language without words with more than one meaning as an intermediate language. Since almost all translation work then takes place in two steps, it seems logical to draw up an artificial machine language for this purpose, for which the Dutch name “Machinees” [ed. Chinese machine] has been used. The name “Metalanguage” has also been proposed. In addition to a real language, a language with logical symbols can also be used. To capture half a million concepts we don’t need more than 19 bits for a concept. That corresponds to less than four characters, so that such a language requires relatively little memory space.
The future [in 1960]
The fact that the text to be translated always has to be typed over on some kind of telex machine in order to be able to enter it as a punched tape into the translation machine will undoubtedly lead to further investigation of the input problem. However, it remains questionable whether translation machines can be developed that can give an excellent translation of prose, or even poetry.
Sources [in Dutch]
- IJ. (IJsbrand) Boxma (1959), Wat is en doet een vertaalmachine?, Natuurkundige voordrachten: Nieuwe reeks 1958-1959 (37).
- IJ. (IJsbrand) Boxma (1960), Vertaalmachines, Tijdschrift van het Nederland Radiogenootschap, deel 25(3), pp 131-146.