|
 4.0 Coding for Reduced Redundancy in Computing and Cognition
In this section, I describe a variety of observations and ideas from the fields of artificial computing and natural cognition which can plausibly be seen as examples of information compression. Concepts from cognate fields such as theoretical linguistics are included in the discussion.
Earlier examples relate mainly to the brain and nervous system while later examples come mainly from computing. But the separation is not rigid because similar ideas appear in both areas.
4.1 Adaptation and Inhibition in the Nervous System A familiar observation is that we are more sensitive to changes in stimulation than to constant stimulation. We notice a sound which is new in our environment - eg the hum of a motor when it is first switched on - but then, as the sound continues, we adapt and cease being aware of it. Later, when the motor is switched off, we notice the change and are conscious of the new quietness for a while until we adapt again and stop giving it attention.
This kind of adaptation at the level of our conscious awareness can be seen also at a much lower level in the way individual nerve cells respond to stimulation. The two studies to be described are of nerve cells in a horseshoe crab (Limulus) but the kinds of effects which have been observed in this creature have also been observed in single neurone studies of mammals and appear to be widespread amongst many kinds of animal, including humans. There are more complex modes of responding but their existence does not invalidate the general proposition that nervous tissue is relatively sensitive to changes in stimulation and is relatively insensitive to constant stimulation.
Figure 1 shows how the rate of firing of a single receptor (ommatidium ) in the eye of Limulus changes with the onset and offset of a light (Ratliff, Hartline & Miller, 1963). The receptor responds with a burst of spike potentials when the light is first switched on. Although the light stays bright for some time, the rate of firing quickly settles down to a background rate. When the light is switched off, there is a brief dip in the rate of firing followed by a resumption of the background rate.
This relative sensitivity to changes in stimulation and relative insensitivity to constant stimulation can be seen also in the spatial dimension. Figure 2 shows two sets of recordings from a single ommatidium of Limulus (Ratliff & Hartline, 1959). In both sets of recordings, the eye of the crab was illuminated in a rectangular area bordered by a dark rectangle of the same size. In both cases, successive recordings were taken with the pair of rectangles in successive positions across the eye along a line which is at right angles to the boundary between light and bright areas. This achieves the same effect as - but is easier to implement than - keeping the two rectangles in one position and taking recordings from a range of receptors across the bright and dark areas.
In the top set of recordings (triangles) all the ommatidia except the one from which recordings were being taken were masked from receiving any light. In this case, the target receptor responds with frequent impulses when the light is bright and at a sharply lower rate when the light is dark.
In the bottom set of recordings (circles) the mask was removed so that all the ommatidia were exposed to the pattern of bright and dark rectangles. In this situation, the target receptor fires at or near a background rate in areas which are evenly illuminated (either bright or dark) but, near the border between bright and dark areas, positive and negative responses are exaggerated. In the spatial dimension, as with the temporal dimension, changes in stimulation are more significant than constant stimulation.
It is now widely recognised that this sensitivity to temporal and spatial discontinuities in stimulation is due to the action of inhibitory pathways in the nervous system which counteract the excitation of nerve cells. The way inhibition may explain the observations which have just been described and a range of other phenomena (Mach bands, simultaneous contrast, motion sensitivity) is well described and discussed by Von Bekesy (1967).It has also been recognised for some time that these phenomena may be understood in terms of principles of economy in neural functioning (see, for example, Barlow, 1969). In the terms discussed earlier, adaptation and inhibition in nervous functioning have an effect which is similar to that of the run-length coding technique for data compression. Economy is achieved in all these cases by coding changes in information and reducing or eliminating the redundancy represented by sequences or areas which are uniform (Von Bekesy, 1967).
4.1.1 `Neural' Computing The idea that principles of economy may apply to neural functioning is recognised in some of the theory associated with artificial `neural' computing, eg `Hopfield nets' (Hopfield, 1982) and `simulated annealing' (Hinton & Sejnowski, 1986). But with some notable exceptions (eg Mahowald & Mead, 1991), there seems to have been little attempt in the development of artificial neural networks to exploit the kinds of mechanisms which real nervous systems apparently use to achieve information compression.
4.2 Objects and Classes in Perception and Cognition
4.2.1 Seeing the World as Objects Some things in our everyday experiences are so familiar that we often do not realise how remarkable they are. One of these is the automatic and unconscious way we see the world as composed of discrete objects. Imagine a film taken with a fixed camera of a tennis ball crossing the field of view. Successive frames show the ball in a sequence of positions across a constant background. Taken together, these frames contain a very large proportion of redundancy: the background repeats in every frame (apart from that part of the background which is hidden behind the ball) and the ball repeats in every frame (let's assume that it is not spinning). Uniform areas within each frame also represent redundancy.
Any simple record of this information - on cinema film or digitised images - is insensitive to the redundancy between frames or within frames and has no concept of `ball' or `background'. But people automatically collapse the several images of the ball into one coherent concept and, likewise, see the background as the `same' throughout the sequence of frames.
This is a remarkable piece of information compression, especially since it is performed in real time by nerve cells which, by electronic standards, are exceedingly slow. On this last point, Mahowald & Mead's work, referenced above, throws useful light on how this kind of compression may be achieved in a simulated mammalian retina and how the necessary speed can be achieved with slow components. In keeping with what was said earlier about neural functioning, inhibitory pathways and processes are significant.
Of course, the concepts we have of real-world objects are normally more complicated than this example suggests. The appearance of a typical object varies significantly depending on orientation or view point. In cases like this, each concept which we form must accommodate several distinct but related views. What we normally think of as a unitary entity should, perhaps, be regarded more accurately as a class of inter-related snapshots or views (more about classes below).
Notwithstanding these complexities, our everyday notion of an object is similar to the previously-described concept of a chunk. Like chunks, objects often have names or tags but again, as with chunks, not every object has a name. An object (with or without a name) may, like a chunk, be seen as the product of processes for extracting redundancy from information.
4.2.2 Stereoscopic Vision and Random Dot Stereograms "In an animal in which the visual fields of the two eyes overlap extensively, as in the cat, monkey, and man, one obvious type of redundancy in the messages reaching the brain is the very nearly exact reduplication of one eye's message by the other eye."(Barlow, 1969, p 213).
Stereoscopic vision and the phenomenon of random dot stereograms provides further evidence of information compression by our nervous systems. It also provides a striking illustration of the connection between redundancy extraction and our tendency to see the world as discrete objects.
Each of the two images in Figure 3 is a random pattern of black and white pixels. Each image, in itself, contains little or no redundancy. But when the two images are taken together, there is substantial redundancy because they have been designed so that they are almost, but not quite, the same. The difference is that a square area in the left image has been shifted a few pixels to the right compared with a corresponding square area in the right image, as is illustrated in Figure 4.
When the images are viewed through a stereoscope - so that the left image is seen by the left eye and the right image by the right eye - one's brain fuses the two images so that the two areas around the squares are seen as the `same' and the two square areas are merged into a single square object which appears to stand out vividly in front of its background.
Random dot stereograms like this are normally used to illustrate and study human abilities to perceive depth using stereoscopic vision. But they are also good examples of our ability to extract redundancy from information by merging matching patterns.
In Figure 3, the central square, and its background, are chunks in the sense described earlier, each of which owes its perceptual existence to the merging of matching patterns. It is the redundancy which exists between the two images, coupled with our ability to find it, which gives coherence to the objects which we see. The vivid boundary which we can see between the square and its background is the product of search processes which successfully find the maximum possible unification between the two images.
In everyday vision (eg the tennis ball example discussed above) recognition of an object may owe something to redundancy within each frame. Since each of the two images in a random dot stereogram contains little or no redundancy, our ability to see coherent objects in such stereograms demonstrates that we do not depend on redundancy within each image but can derive an object concept exclusively from redundancy between images.
4.2.3 Classes and Sub-Classes It is commonly recognised that objects may be grouped into classes and classes may themselves be grouped into higher level classes. Cross-classification with overlapping classes (eg `woman' and `doctor') is also seen.
A class may be defined extensionally by listing its constituent objects or classes. It may also be defined intensionally in terms of attributes which members of the class have in common. Although many commonly-used classes are `polythetic' - no single attribute need necessarily be shared by all members of the class - it is similarity amongst members of a class - some degree of sharing of attributes - which gives `natural' classes their coherence.
Grouping things by their similarity gives us a means of compressing the information which describes them. An attribute which is shared by all members of a class (or, in the case of polythesis, a set of alternative attributes which is shared by members of a class) need be recorded only once and not repeated for every member. The shared attributes of a class constitute a `schema' in the sense discussed earlier, which may be `corrected' for each member by the addition of more specific information about that member.
The widespread use of classes and subclasses in our thinking and in language, coupled with their obvious value in compressing information, strongly suggests that we do store classes and attributes in this way. But it is difficult to obtain direct confirmation of this idea. Attempts to verify the idea experimentally (eg Collins & Quillian, 1972) have proved inconclusive. This is probably more to do with the difficulties of making valid inferences from experimental studies of human cognition than any intrinsic defect in the idea.
4.3 Natural Languages Samples of natural language - English, French etc - are normally about 50% redundant (Miller & Friedman, 1958). This redundancy often serves a useful purpose in helping listeners or readers to correct errors and to compensate for noise in communication -and this is almost certainly the reason why natural languages have developed in this way.
Despite the existence of redundancy in natural languages, they provide a further example of economical coding in cognition. Every `content' word in a natural language (eg noun, verb, adjective or adverb) may be regarded as a tag or label for its meaning. The meaning of a typical word is a relatively complex chunk of knowledge associated with the word.
Without a convenient brief label like `table', the concept of a `horizontal platform with four, sometimes three, vertical supports, normally about three feet high, normally used for ...' would have to be long-windedly repeated in every relevant context rather like the slow language of the Ents in Tolkien's The Lord of the Rings . A sentence is normally a highly `coded' and compressed representation of its meanings.
An even more obvious example of tags in natural language is the use of `references' in books or articles. "(Zipf, 1949)" is an example which appears in the second paragraph of this article and also in the paragraph after this; it references the fuller details given at the end of the article. These details may themselves be seen as a label for the whole book. Like any tag, a reference can circumvent the need to repeat information redundantly in two or more contexts. But it can be and often is convenient to use this device for reasons of consistency and style when a given reference appears only once in a book or article.
Before leaving this section on natural language, it is relevant to comment on Zipf's extensive studies of the distribution of words in natural languages (Zipf, 1935) and his Principle of Least Effort (Zipf, 1949), mentioned earlier, which he proposed to explain these observations and others. Zipf's arguments are interesting and quite persuasive but, as Mandelbrot (1957) and others have pointed out, the phenomena described by `Zipf's law' could be due to nothing more profound than a random process for creating words and the boundaries between words in natural languages. However, in George Miller's words (1965), "It is impossible to believe that nothing more is at work to guide our choice of letter sequences than whatever random processes might control a monkey's choice, or that the highly plausible arguments Zipf puts forward have not relevance at all." (p vii). The jury is still out!
4.4 Grammars A grammar may be regarded as a compressed version of the language which it represents. More accurately, the notational conventions which are used in grammars may be regarded as a set of devices which may be, and normally are, used to encode information in an economical form. They are not necessarily used to good effect in any one grammar.
4.4.1 Context-free Phrase Structure Grammars Compression is illustrated by the grammar shown in Figure 5. This grammar is written according to the notational conventions of context-free phrase structure grammar (CF-PSG), a simple type of grammar which is essentially the same as Backus Normal Form (BNF), commonly used to represent the syntax of computer languages.
Figure 5. A context-free phrase structure grammar.
This grammar represents the set of terminal strings (sentences) which include the boy meets the girl, the girl likes the boy etc but with none of the redundancy represented by repeated instances of individual words - boy, girl etc - or of `noun phrase' groupings like the boy, the girl etc. These repeating elements are `chunks' of information in the sense described earlier. The symbols S, NP, V, D and N are `tags' used to identify their corresponding chunks in each of the contexts in which they occur.
Notice that a grammar like the one just shown says nothing about the order in which the sentences may appear and it says nothing about how many of each sentence may appear. A grammar is typically a `lossy' compression of any one sample of the language which it represents.
The `chunks with tags' device also permits the encoding of recursion which can itself be seen as a form of run-length coding. The fragment of grammar shown in Figure 6 generates phrases like the very very very tall girl, a very very short boy etc. Notice that the number of instances of very in any one phrase is not specified: recursion like this represents lossy compression of any finite set of terminal strings.
Figure 6. A CF-PSG with recursion.
4.4.2 More Powerful Grammars It has been recognised for some time (and pointed out most notably by Chomsky, 1957) that CF-PSGs are not `powerful' enough to represent the structure of natural languages effectively. As shown by the examples just given, CF-PSGs can be used to represent simple sub-sets of English in a succinct form. But the full complexity of English or other natural language can only be accommodated by a CF-PSG, if at all, at the cost of large amounts of redundancy in the grammatical description.
The phenomenon of `discontinuous dependencies' highlights the shortcomings of CF-PSGs. In a sentence like The winds from the West are strong , there is a `number' dependency between winds and are: the plural noun must be followed by a plural verb and likewise for singulars. The dependency is `discontinuous' because it jumps over the intervening structure (from the West in the example) and this intervening structure can be arbitrarily large.
To represent this kind of dependency with a CF-PSG requires one set of rules for singular sentences and another set of rules for plurals - and the two sets of rules are very similar. The consequent redundancy in the grammar can multiply substantially when other dependencies of this kind are included.
This problem can be largely overcome by using a more `powerful' kind of grammatical system like a Definite Clause Grammar (Pereira & Warren, 1980) or a Transformational Grammar (see, for example, Radford, 1988). This kind of system may be seen as a superset of a CF-PSG with additional mechanisms which, amongst other things, allow discontinuous dependencies to be represented without undue redundancy.
It is not appropriate here to discuss these systems in detail. In general, they can be seen as variations on the `schema-plus-correction' idea which was described above. They provide the means of representing sentence structure as a `schema' which may, in the example given earlier, be `corrected' by the addition of singular or plural components at appropriate points in the structure.
4.5 Computer Programs Functions in computing and mathematics are often defined `intensionally' in terms of rules or operations required to perform the function. But functions are also defined `extensionally' by specifying one or more outputs for every input or combination of inputs (Sudkamp, 1988). This idea applies to functions of various kinds (`total', `partial' and 'multi') and also to `programs' and information systems in general.
Elsewhere (Wolff, 1994) I have discussed how a computer program or mathematical function may be seen as a compressed representation of its inputs and its outputs, how the process of designing programs and functions may be seen to be largely a process of information compression, and how the execution of programs or functions may also be seen in terms of information compression.
In this section, I view computer programs more conventionally as a set of `operations' and discuss how principles of compression may be seen in the way programs are organised. In the the same way that the notational conventions used in grammars may be regarded as a means of compressing linguistic information, the conventions used in computer programs may be seen as devices for representing computing operations in a succinct form. As with grammars, the provision of these facilities does not in itself guarantee that they will be used to best effect.
Compression techniques may be seen in `functional', `structured' and `logic' programming and, a fortiori in `object-oriented' programming.
4.5.1 Functional and Structured Programming Chunking and tags . If a set of statements is repeated in two or more parts of a program then it is natural to declare them once as a `function', `procedure' or `sub-routine' within the program and to replace each sequence with a `call' to the function from each part of the program where the sequence occurred. This is an example of compression: the function may be regarded as a `chunk' and the name of the function is its `tag'.
Whether or not a programmer chooses to create a function in a situation like this - or to leave repeated sequences as `macros' - depends, amongst other things, on whether the sequence is big enough to justify the `cost' of giving it a name and whether the run-time overhead which is typically associated with the use of functions in conventional computers is acceptable for the application in hand.
Run-length coding. If a body of code is repeated in one location within the program then it may be declared as a function or marked as a `block' and the fact of repetition may be marked with one of the familiar conventions for showing iteration: repeat .... until, while ... do, for ... do . Each of these is a form of run-length coding.
Recursion, which is available in most procedural programming languages, is another means of showing repetition of program operations. It is essentially the same as recursion in grammars.
Schema plus correction . It often happens in software design that two or more sets of statements are similar but not identical. In these cases, it is natural to merge the parts which are the same and to provide conditional statements (if ... then ... else statements or case statements) to select alternative `paths' within the software. The complete set of statements, including the conditional statements, may be regarded as a `schema' describing a set of behaviours for the program, much as a grammar describes a set of terminal strings.
To specify a particular path through the program, it is necessary to supply the information needed in the conditional statements so that all the relevant choices can be made. This additional information is the `correction' to the schema represented by the program code. If the program statements have been encapsulated in a function then these corrections to the program schema will normally be supplied as arguments or parameters to the function.
4.5.2 Logic Programming Similar ideas appear in logic programming languages like Prolog. For example, the chunking and tags idea can be seen in the structure of a Prolog Horn clause. The predicate in the head of the clause may be seen as a label or tag while the body of the clause may be seen as the chunk which it labels.
As with functional and structured programs, a Prolog program may be seen as a schema which represents the set of possible behaviours of the program. The information supplied in a Prolog query serves as a correction to the schema which reduces the range of possible behaviours of the program.
Repetition in Prolog programs is coded using recursion.
4.5.3 Object- Oriented Programming Object- oriented programming (OOP), as it was originated in Simula and has been developed in Smalltalk, C++ and several other languages, embraces the kinds of mechanisms just described but includes an additional set of ideas which may also be understood in terms of information compression.
One important idea in OOP, which was introduced in Simula, is that there should be a one-for-one correspondence between objects in the real world and `objects' in the software. For example, an OO program for managing a warehouse will have a software object for every person employed in the warehouse, a software object for every shelf or bay, a software object for every item stored, and so on. In OO terms, a software object is a discrete piece of program: data structures and associated procedural code or either of these without the other. For example, an object for a `person' would be a program which can respond to `messages' such as a request for the person's name, an instruction to move an item from one location to another, and so on.
Superficially, this is an extravagant - and redundant - way to design software because it seems to mean that the same code is repeated in every object of a given type. But, of course, this is not how things are done. As with `real world' objects, economy can be achieved by the use of classes and sub-classes.
In OO programming languages, every object belongs to a class (in some OO languages an object can be assigned to more than one class), the code for that class is stored only once within the program and is inherited by each instance of the class. There can be a hierarchy of classes with inheritance of code from any level down to individual objects.
As with individual objects, a recognised principle of OOP is that the classes which are defined in an OO program should correspond, one for one, with the classes which we recognise in the real world. In the warehouse example, there would be a class for each of `person', `shelf' and `item'. Each class - `person', for example - may be divided into sub-classes like `manager', `foreman', `operative' etc.
The ideas which have been described - software objects, classes and inheritance - are further examples of the way information compression pervades the organization of computer programs:
- As discussed earlier, objects and classes as we see them in the real world may be understood in terms of our subjective compression of perceptual information received from the world. By using similar devices in software design we can make the structure of software reflect patterns of redundancy in the real world which our brains are apparently so efficient at exploiting..
- As was mentioned in connection with natural classes, the mechanisms of class hierarchies with inheritance may be seen as an example of the schema plus correction technique for information compression: a class definition is a `schema' and information which is supplied when a new instance of the class is created (eg the name of that object) is a `correction' to the schema. In a similar way, a high level class may be seen as a relatively abstract schema which is refined or `corrected' by the more specific information contained in lower level classes. It is perhaps pertinent to comment, in passing, on the advantages of designing software in this way: .
- One advantage is psychological: software which reflects the structure of our established concepts is easier to understand than software which does not..
- A more subtle, but nonetheless important, advantage is that software designed in this way will normally be easier to modify than otherwise: the fact that the structures reflected in the software are persistent (repeating) patterns in the world means that new versions of software are more likely to be refinements or rearrangements of already established objects and classes than radical reorganizations of the code..
- There is a third advantage for software designers in using these mechanisms for information compression: if any given piece of information is recorded only once within a program then any change to that information needs be made only once and there is no need to check that it is correctly repeated in other parts of the program
4.6 Other Aspects of Computing We have already seen several uses for identifiers or tags in computing - as the names of functions, procedures or sub-routines, as the names for OOP objects and classes of objects and as identifiers for rules in grammars. Some other examples of tags in computing include:
- Names of tables or records in databases.
- Names of fields in tuples or records.
- Names of files.
- Names of variables, arrays and other data structures used in computer programs.
- Labels for program statements (for use with the now shunned `go to' statements).
The names of directories in Unix, MS-DOS and similar operating systems are also tags in the sense of this article. But an hierarchical directory structure may also be seen as a simple form of class hierarchy and, as such, it may be seen as a restricted form of schema plus correction. The name of a directory is a name which applies to all the files and sub-directories within that directory and to all files and sub-directories at lower levels. Rather than redundantly repeat the name for every file and sub-directory to which it applies, the name is recorded once and, in effect, `inherited' by all the objects at lower levels. |