June, 1993
Conventionalist theories of pictorial representation have enjoyed much popularity in recent decades. At their extreme, these theories have difficulty accounting for some of our recognitional capacities. In particular, they cannot offer a cogent explanation for the fact that certain pictures can be successfully interpreted by people in pictorially innocent cultures. At first blush, this fact seems to add credibility to naturalist theories of pictorial representation. But, at their extreme, naturalist theories are vulnerable to the barrage of objections which have been devised by conventionalists. Clearly, a moderate theory is required. We must account for the success of the pictorially innocent without giving in to naive resemblance theories. In the following, we will try to motivate the adoption of a moderate theory, and we will make some preliminary suggestions about the form such a theory should take. We will begin with an account of interpretation among the pictorially innocent. Then, we will evaluate the success of strong conventionalism and naive naturalism in light of this data. After that, we will sketch a minimal theory which avoids many of the objections which threaten its competitors.
Fortunately, there are cultures which provide better test subjects than our own. Anthropologists have discovered populations which are pictorially innocent. To our knowledge, members of these cultures do not engage in pictorial practices and have not been exposed to pictures created by others. Their innocence has provided fertile soil for investigations into the minimum requirements for picture recognition. Over the years many anthropologists have recorded how pictorially innocent people react to pictures. These accounts are remarkably inconsistent. Some anthropologists observed that pictorially innocent people cannot interpret pictures at all (prior to some kind of training);{3} others say that pictures are readily interpreted, and that they are often mistaken for the things which they depict;{4} some record a discrepancy in the recognitional capacities of young and old subjects;{5} still others claim that certain pictures (e.g., frontal photographs of human faces) are easily interpreted while others (e.g., photos of people in motion) remain mysterious.{6} Regrettably, many of these observations have been made under less then ideal experimental conditions. Thus, their reliability cannot be taken for granted.
Recent research has been motivated by the desire for more conclusive data about innocent interpretation.
In order to adjudicate between the inconsistent observations made through casual experiments, some anthropologists have returned to the field with a new emphasis on rigor.
Among these is a team headed by E.S. Muldrow, W.F. Muldrow and J.B. Deregowski.
This team conducted a series of experiments with the Me'en, an isolated, pictorially innocent tribe in Ethiopia.{7}
They showed the Me'en a series of three representational drawings printed on a coarse fabric.{8}
The first two were depictions of animals with which the Me'en were very familiar (a buck and a leopard, see Figure 1).
When asked to identify what they saw, Me'en of all ages interpreted these pictures with remarkable success.
With the exception of a couple of subjects who were probably intimidated by the testing situation, all those tested were able to accurately identify the objects in the pictures.{9}
It is interesting to note that their identifications never came immediately.
They usually recognized particular elements of an image (a tail, a foot, horns, etc.) before piecing together the whole.
A third picture shown to the Me'en depicted a hunting scene with pictorial depth cues (relative size and overlapping).
In this case, decipherment proved more difficult.
The Me'en (along with other pictorially innocent people) were unable to detect pictorial depth.
If the Me'en saw the images as actual animals, then their pictorial experience was very different from the experience of pictorially sophisticated viewers. As Wollheim and others have made clear, we can see a picture both as a picture and as its depictum concurrently. If the Me'en could not do this, the tests might give little insight into how we see pictures. However, this objection is without foundation. As we shall see, even if the Me'en saw the pictures as actual animals, their success in recognizing those animals casts doubt on certain theories of pictorial interpretation. Thus, their success remains relevant.
Elsewhere, I argued that we have to realize that something is a picture in order to determine what it depicts. Clearly, we must know that p is a picture if we are to conclude that p is a depiction (and not an instantiation or exemplification) of some object O. But we do not need to know this when we simply (and perhaps ambiguously) identify p as O. When we say that someone associates a two-dimensional inanimate image with an object, we might mean that she identifies the image with the object or that she sees the image as a representation of the object.{11} Both these forms of association are possible just in case there is something about the viewer's experience of the image which brings into her consciousness a thought of the object of that image. By recognition we will mean this minimal (and vague) kind of association. When we say that pictorially innocent people can correctly recognize pictures, we simply mean that, while experiencing a picture p of O, they have a thought of O as a direct result of experiencing p. It may be that they see pictures as pictures, but I no longer think this is necessary. As long as they minimally associate a picture with its depictum without any previous exposure to pictorial representations, we can infer that certain theories of interpretation are inadequate.
Another objection to the data about the Me'en questions the plausibility of pictorial innocence. Our reasons for describing the Me'en as pictorially innocent may be insufficient. The fact that the Me'en produce no pictures does not mean that they do not see pictorially. For example, they may have a practice of seeing images in clouds or trees, they may have seen reflections in water, etc. In order to see a cloud as an antelope or a reflection as the object it reflects, the Me'en must have a way of interpreting one thing as another thing. Does this count as a form of pictorial sophistication? I think not. Clouds and reflections are not representations. They are not produced by an agent in order to inspire a thought of a certain kind in a viewer. In the case of clouds, we cannot speak of a correct interpretation.{12} To see a cloud as an object is not to recognize the object which that cloud represents. Identifying reflections is a slightly different phenomenon. Reflections can be correctly identified. However, identifying a reflection must be sharply contrasted with pictorial identification. When we identify a reflection, we recognize a (proximate) object which is the source of the reflection. With pictures, such an object is rarely present. We can also make a Gricean distinction. Reflections naturally indicate their content, while pictures non-naturally denote their content. It is unlikely that pictorially innocent people mistake pictures for reflections. If they do, this provides evidence against constructivist accounts, because the relationship between a reflection and its content can be defined without reference to an observer. The objector who questions the pictorial innocence of Me'en believes that picture recognition is a learned ability. If this objector thinks that learning to see pictures requires learning conventions, then she should not cite previous exposure to reflections as an instance of previous exposure to pictures. Interpreting reflections involves recognition of a natural relationship. The arguments against the pictorial innocence of the Me'en are inconclusive and question begging. They rest on unsubstantiated analogies between perceiving pictures and other kinds of perceptual activities. The value of these analogies can only be measured after we develop a theory of perceiving pictures. The Me'en may be less innocent then we think, but they are far more innocent than you or I. In this regard, they can give us a more accurate measure of the minimum requirements for picture recognition.
The Me'en study demonstrates that certain man-made representational images can be correctly interpreted (according to our notion of minimal association) without previous exposure to man-made representational images. It also suggests, in the case of depth depiction, that this ability may be limited. Regardless of these limitations, the recognitional success of the Me'en has implications for the theory of pictorial representation.
Some a priori arguments have been raised against this breed of conventionalism. We will only defend an empirical objection. If strong conventionalism were correct, we would not expect the Me'en to be successful.{15} On this account, the artist who generated the pictures shown to the Me'en was operating according to a conventional system of representation. The Me'en would only be able to decode these pictures successfully if they had had previous exposure to a similar system. There is absolutely no evidence that this is the case. Without pictorial practices, it seems unlikely that the Me'en have (even tacitly) a system of pictorial representation. If they did have such a system, it would be hard to explain its similarity with the system used by the artist. Unlike Gombrich, Goodman does not privilege different systems of representation. He maintains that the realism of a picture depends on the familiarity of a particular code. If one set of conventions for generating and interpreting pictures is prevalent in a society, pictures which conform to those conventions will be regarded as realistic.{16} Thus, Goodman cannot appeal to inherent superiority of symbol systems in explaining the coincidental correspondence between the putative system of the Me'en and the system of the artist.
Appealing to previous exposure to pictures or postulating a coincidental similarity between two conventional codes developed in isolation of one another will not help the conventionalist. Neither route has any plausibility. However, there is another explanation of the Me'en's success which seems more attractive. One could argue that the Me'en could learn the code for deciphering the pictures just by looking at those pictures. Consider the success of cryptologists. They can decipher codes without looking at any kind of key or translation scheme. If this is possible for a cryptologist, why couldn't it be possible for the Me'en.{17} The analogy cannot succeed. First, cryptologists work with a great deal of information and resources at their disposal. They know how codes are generated; they know important statistics like the frequency of a certain letter's appearance in a language; they can make helpful guesses about the subject or meaning of the code before deciphering it; since World War II, they have used sophisticated instruments; the difference between a code and its referent is merely alphabetic, because the referent is another code (i.e., in a natural language); etc.{18} Another disanalogy is evidenced by the fact that it is possible that the Me'en mistake pictures for the things the depict. This misidentification is impossible for the cryptologist; upon deciphering a code, the cryptologist will not mistake it for a sentence in her natural language nor for the content of that sentence. This suggests that the connection between a code and its meaning differs markedly from the connection between a picture and its depictum. Finally, if all pictures include enough information to be decoded, we would expect all pictures to be interpretable by the Me'en. This is not the case. We have already seen that they could not correctly decipher the depth depiction. The conventionalist could reply that some pictures have inherent keys for decipherment and some do not. But this would do violence to their position. If some pictures employ manifestly decipherable codes, those pictures should be privileged. Their content could be explained without reference to the knowledge of an observer. Why doesn't the conventionalist count these pictures as more realistic than other pictures? A strong conventionalist would abhor this move. Thus, it would be imprudent for her to appeal to inherent translation keys and the cryptology analogy.
The shortcomings of strong conventionalism do not rule out a weaker, restricted conventionalism. It is possible that certain elements of representational art are conventional while others are not. Common depth cues -- like overlapping, relative size and, most notoriously, linear perspective -- might fall into the conventional category. This would explain why the Me'en could not decipher the picture which utilized these cues. However, the failure of the Me'en is not conclusive. Many authors have argued that an inability to decipher pictorial depth cues might come from cultural or ecological differences. For example, living in a culture with few carpentered objects or living in a region with few open spaces might limit peoples' ability to identify pictorial perspective. Similarly, the fact that viewers seem to learn how to interpret depth cues is inconclusive, because perceptual abilities can also be acquired. When we learn how to interpret information in our visual field, we are not necessarily learning a code. Furthermore, some of the empirical evidence used to defend the conventionality of perspective has been invalidated. Goodman says that individuals accustomed to depth-depiction in Far Eastern arts, will have difficulty deciphering linear perspective, and they will not regard perspective artworks as more realistic. Alan Tormey persuasively argues that both of these points are disconfirmed by history.{19}
Convention certainly plays some role in art. For example, many elements in Flemish Renaissance paintings have conventional significance. But here it is useful to draw a distinction between depiction and representation.{20} A picture might include a depiction of a skull, and that skull might represent Golgotha. The relationship between depictions and what they represent is likely to rely heavily on convention. The important question concerns the relationship between depictions and what they depict. We have seen that a strong conventionalist explanation of that relationship cannot adequately account for our recognitional capacities. We have not argued decisively against restricted conventionalism, but we have suggested that commonly cited empirical evidence is insufficient to confirm or disconfirm this view. The plausibility of a restricted conventionalism will be measured in part by the comparative plausibility of competing views.
At first blush, resemblance theories seem to be compatible with the data taken from the Me'en study.
If there is an actual resemblance between a picture and its depictum, that resemblance should be recognizable to any one who can recognize the depictum.
This might explain the interpretive success of the pictorially innocent.
However, it seems to predict too much success.
Some photographs and drawings cannot be successfully interpreted by pictorially innocent people.
Are we to conclude that some photographs resemble their objects while others do not?
Such a claim would require a strong defense.
It also seems possible that a picture and an object could have similar information without causing us to recognize that the picture is a depiction of the object.
This is one lesson of the thread worn duck-rabbit (Figure 2).
When we see the duck-rabbit as a duck picture, it resembles a rabbit no less.
The reason why we do not see it as a rabbit while seeing it as a duck involves something about the way we perceive images.
Resemblance theorists tend to overemphasize the properties of pictures; they neglect the cognitive contribution to pictorial perception.
A version of a resemblance theory based on vision is defended by Snyder.{21} Snyder, insofar as I understand him, seems to think there is no natural or automatic connection between pictures and the pictured. In addition, there is no natural or automatic connection between the way we see pictures and the way we see the pictured. However, this does not prevent us from seeing pictures and the pictured in the same way. We can look at the world in many different ways. The artist who wishes to capture the world on a two-dimensional surface is compelled to look at the world in a two-dimensional way, a way which conforms to her objectives. She can actually abstract reality in her mind, and see the world as a picture. Once done, she merely needs to paint what she sees. Similarly, a person well versed in a particular mode of pictorial representation will learn to construct the world in her mind in accordance with the pictorial rules used to construct those familiar pictures. She will see the world as a picture. If vision is "pictured" in this way, then we can easily explain our firm faith in the realism of certain images. Images which conform to the standardized models through which we experience the real world, will appear to be accurate reflections of the world. This accuracy will not lie in the relationship between the image and the world, but in the way we see the image and the way we see the world.
Snyder's view is quite attractive,{22} but it does not give a completely satisfying explanation of the Me'en test results. In a pictorially innocent society, it is unlikely that individuals have "pictured" their vision. They have no need to cognitively enact the reductions which translate perceived objects into the kinds of objects we would perceive in a photograph. Thus, Snyder cannot explain the success of their interpretations. On his account, the prerequisites for picturing vision are not fully articulated, but they seem to be along the following lines. To acquire a model of vision which would permit us to interpret pictures, either we would have to have the experience of trying to create a two-dimensional representation of a three-dimensional object, or we would have to be exposed to many pictures which have accomplished this task. As far as we know, the members of the Me'en had never tried to create pictures, and they had only seen the three pictures used in the test. Thus, their success remains mysterious.
What we propose is a revision of (our reading of) Snyder's theory.
Snyder emphasizes pictorial ways of seeing, but there seem to be other ways of seeing which have nothing to do with modes of representation.
When we see the world, much of the information received by our retina is superfluous.
In any act of vision, we organize retinal stimulation by focusing on things, excluding things, identifying things, and distinguishing things.
Somehow, we are able to determine which stimuli are relevant to a certain purpose, say identifying an object in our visual field, and which are not.
This is particularly manifest in our capacity to recognize patterns.
One locus classicus for pattern recognition research is our ability to identify letters of the alphabet.
We can recognize a given letter in thousands of different typefaces and (potentially) infinite different handwritings.
We can account for this in one of two ways.
First, we could say that each concrete token of a particular letter, say 'R', has features in common.
We (tacitly) recognize those features whenever we correctly identify an 'R'.
But this explanation may rely too much on inherent properties of the letter tokens.
If we examine different R-tokens, it is hard to locate a fixed set of invariant characteristics.
Consider the samples in Figure 3.
These tokens could not all fit into an internalized R-template.
It is more likely that we internalize a disjunctive set of properties the preponderance of which must be possessed by an inscription in order for us to classify that inscription as an 'R'.{23}
This set of properties is not alone sufficient for identification.
Letter identification usually utilizes a collection of clues in and around an inscription along with various expectations, memories, and beliefs.
As Goodman has shown, a single token can have the invariant properties of a 'd' and the invariant properties as an 'a'.
Contextual clues (e.g., surrounding letters) along with knowledge of English (e.g., recognizing possible words) are used to disambiguate.{24}
Thus, letter recognition seems to involve more cognitive activity than the invariant property view suggests.
When we identify two things as an 'R' we don't necessarily mean that they resemble each other.
Instead, there is something in common about the way we see the two of them.
Without giving a detailed account, we can call this common way of seeing 'R-seeing.'
When we R-see a letter we combine perceived data about that letter along with previous beliefs to see that letter as an 'R'.
We can R-see letters which are not R's, such as a 'K' or a 'B'.
Importantly, R-seeing a 'K' or 'B' does not require that we believe that the 'K' or 'B' is an 'R'.
It merely requires that we can see each of them as an 'R', or have an experience in seeing them which we consciously recognize as similar to our experience of seeing an 'R'.{25}
Letter recognition capacities depend on a kind of recognitional competence. To be competent with regard to a particular letter is to be able to identify infinite variations of that letter and (ceteris paribus) to be able to generate infinite variations of that letter.{26} An R-competent individual will generally be able to identify an 'R' in a typeface which she has never seen. Similar pattern recognition skills underlie many of our perceptual activities. We are able to identify a single form through countless variations. How this works is still the subject of debate; that it works is not. One thing about how it works is clear: convention does not play a necessary role. In all likelihood, we are hardwired to detect patterns.{27}
We think that pattern recognition abilities underlie pictorial competence. In seeing p as a picture of O, we must recognize something in p which we recognize in O. For example, we might recognize something in p which causes an experience similar to the experience we have when we are perceiving antlers of a certain variety. This might lead us to believe that p is a picture of a particular kind of antelope. At the same time certain properties which we normally ascribe to such antelopes will be missing (e.g., color, depth, volume, smell, movement, etc.). But these missing elements might not impair our recognitional capacity. Recent neurological research points to massive functional specialization within the visual cortex.{28} The success of our antelope identifying device depends upon its ability to filter out stimuli which are variant across our perceptual encounters with antelopes. These might include color, depth, and so on. Perceptual information about color and depth might be inessential to antelope recognition in much the same way serifs and stroke width are inessential in identifying letters. If that is correct, a depiction of an antelope can exclude such information and remain recognizable. Whether or not a certain property is dispensable for recognition might be determined by societal convention.{29} But other properties will be counted as dispensable for more natural reasons. It is certainly plausible that color, depth and size are dispensable when identifying animals. After all, if one views an animal at night, in the distance, and with one eye closed, it will appear gray, small and flat, but it will still be identifiable. This could explain Me'en's ability to interpret animal pictures despite the marked discrepancies between pictures and reality.
At first blush, our theory looks like another naive resemblance theory. But it is more subtle then it appears. Recognizing letters is a complex and mysterious process. Recognizing other kinds of patterns and connections can be vastly more complex. When we become competent in recognizing a particular kind of thing, T, we internalize a group of stereotypical properties, P. When we recognize a preponderance of those properties in a perceived object, we are likely to see that object as a T. But recognizing those properties also requires competence. For every p in P, we must be p-competent. To be p-competent, we must internalize a group of properties P' which typify things with the property p, and so on. This regress through finer and finer levels of competence can explain why a picture and its depictum do not resemble each other in obvious ways. Two things can trigger the same recognitional capacity without having much in common. If we tabulate the stereotypical properties used to identify a T along with the stereotypical properties of those properties, and so on, we will amass a huge set of properties which constitute our T-competence. Let us call this huge set of properties the T-set. It should be clear that two distinct objects can each possess enough members of the T-set to be classified as T's without possessing identical subsets of the T-set. It is unlikely that two such objects will posses disjoint subsets of the T-set, but any overlap in T-properties will be accompanied by other similarities which are not essential to identifying a T. Thus, it will appear that any resemblance between the two objects would be insufficient to explain their being classified together. The two things will not resemble each other in any interesting way. However, they will both resemble the T-set. The T-set, we should recall, is not a set of properties possessed by all T's; it is a set of properties which plays a cognitive role in our ability to identify and classify things. Thus, seeing two things as T's does not require seeing a resemblance between those things; it only requires seeing a resemblance between the way those two things are cognitively related to us.
This leads us to the following minimal theory of pictorial recognition. S perceives p as an O just in case (i) S is O-competent; (ii) p has a preponderance of properties belonging to the O-set (the O-set consists of all known stereotypical, visual properties of an O); and (iii) S recognizes that preponderance in p.{30} Returning to our adverbial model, we can summarize these conditions as: S perceives p as O iff S O-sees p. Here, seeing-as is deliberately vague. To see p as O it neither requires nor excludes the recognition that p is a picture. We can O-see an O, we can O-see a picture, we can O-see a cloud; and we can O-see empty space (hallucination). Generally this O-seeing is accompanied by knowledge about the object of our seeing (i.e., knowledge that it is an O, or a picture, or a cloud, etc.), but it need not be. This explains why the Me'en might see the leopard picture as a leopard without seeing it as a picture. It also explains why we can say they saw the picture as a leopard without committing to the view that they mistook it for a leopard or that it resembled any real leopard in an interesting or readily detectable way. Finally, it explains the methodical process used by the Me'en when interpreting pictures. They examined the test pictures piece by piece, identifying individual properties and then identifying the object characterized by the sum of those properties. They exhibited their tail-, spot- and paw-competence before they could exhibit their leopard-competence.
This last point raises an important issue. In our received conception of realism, depicting a single entity from multiple vantage points in a single picture conflicts with realist depiction.{31} In other cultures, this may not be the case. It has been noticed that members of many African societies prefer depictions of cubes showing all of their faces in spite of the fact that their faces would not all be visible from any fixed vantage point; Northwestern Native Americans produce many images of animal heads which combine different views; ancient Egyptian artists painted figures with frontal torsos and profile heads and legs; and children often draw animals with all their appendages clearly delineated even though they would not all be clearly visible from a single view. On our first approximation of the definition of 'realism', each of these forms of depiction is realistic, because each includes information from the set of recognizable properties. But, our everyday conception of 'realism' precludes these forms of art. Thus, 'realism' must be a more relativistic concept than our original definition implies. Wherein lies the relativity of realist depiction?
Up until know, we have said little about how the set of stereotypical properties which are criterial for the identification of a given object are acquired. We should now note that they can be acquired in different ways and with different constraints. There are countess procedures for identifying a given object. Which procedures we use depends largely on how we typically encounter that object. If an object has a property which is rarely perceptible in our typical encounters with that object, this property might not trigger our recognitional capacity. If an object has a property which is invariably perceptible when we encounter that object, and if that property is unique to that object, it is likely to play a pivotal role in recognition. Many of these differences in familiarity with stereotypical properties can be described as ecological. For example, depth is stereotypically perceived in a number of ways. It can be seen by observing converging lines which are known to be parallel; it can be detected by noticing overlapping objects; it can be evinced by the way colors and textures fade and blur in more distant parts of the visual field; it can be seen by noticing height difference in the visual field, etc. Often these cues will work collectively, but, in some societies, one cue is privileged above the others. This privileging is generally influenced by the kind of environment in which the members of a society live. In highly-carpentered societies, converging lines will dominate; in densely forested societies, overlapping will be key; in flat plain societies, blurring textures will be emphasized; and in mountainous societies, comparative height will play a more important role. Perhaps the difficulty the Me'en had in deciphering depth depiction derives from such ecological differences. Other differences in the way we hierarchize visual clues might be more arbitrary or historical.{32} These differences can result in relativism. If realism is defined in terms of agreement with stereotypical properties, cultural differences in what properties are stereotyped will result in varieties of realism.
We began our discussion with an account of the interpretive capacities of pictorially innocent people. We discovered that such people have a remarkable capacity to recognize the content of simple pictures which we would ordinarily describe as realistic. Their success suggests that the ability to interpret certain pictures does not require prior exposure to a code. This conclusion cannot be accommodated on typical conventionalist accounts. Thus, there is an inclination to retreat back to naive resemblance theories which explain recognition in terms of similarities between pictures and their depicta. The implausibility of such theories inspired us to look for a more subtle kind of resemblance. We found our goal in studying human letter recognition abilities. The resemblance between letters cannot be defined solely in terms of the properties inherent in two tokens of a single letter. Instead, we must appeal to cognitive structures -- internalized sets of properties which are stereotypically predicated of a given letter. In pattern recognition, resemblance moves from an object level to a cognitive level. We defined picture recognition along the same lines. We suggested that there are (culturally informed) cognitive structures which can be triggered by both a picture and its depictum despite ostensible differences between the two. Thus, pictures have important similarities with their contents, but those similarities cannot be described without reference to the mind of an agent. Similarity is in the eye of the beholder.
[Click here to read "Conventionalism and Non-Naive Resmblance," comments on this paper by Jonathan Cohen.]