Image from Arenamontanus (CC BY-NC 2.0)
Guest post by Joe Gladstone ( email@example.com ), PhD Candidate – Cambridge Judge Business School, Cambridge University
Connectionist models are widely held to have had a revolutionary impact upon cognitive science (Marcus, 2001). However, they are also employed in a highly controversial doctrine known as ‘eliminative materialism’, which claims the central posits of our common understanding of human psychology, including our conception of ‘beliefs’, are entirely false (Ramsey, Stitch & Garon, 1990). If such arguments are accepted, a radical reorientation is necessary in how we perceive and predict human behaviour, one that does not allow for human desire, intention or beliefs of any kind. The cluster of psychological constructs under threat are known as ‘propositional attitudes’, with much of the controversy stemming from Ramsey, Stich and Garon’s (1990, RS&G from here) assertion that “If connectionist hypotheses…turn out to be right, so too will eliminativism about propositional attitudes” (p.500). What this means is not that reference to beliefs will be dispensed with as a psychological definition, but that it will turn out that beliefs and related ideas do not, and have never, existed in any form.
This essay discusses ‘beliefs’ in the context of common sense (or ‘folk’) psychology (Churchland, 1989). This is simply the set of assumptions for how individuals generally attribute human behaviour. For example, If X wants Y, and believes that Z is necessary for Y, then X will do Z. So, if John was hungry, and he believed food would satisfy that hunger, then he will eat food if provided it. The eliminative materialist position, argued most forcibly by RS&G, holds that connectionist networks do not contain these folk psychological states within them, nor do they require such states to perform cognitive tasks. Therefore, if we presume that such networks are faithful to how the brain operates, then beliefs and related constructs must be false. As it is difficult to imagine a world where such beliefs do not exist, or to find an adequate alternative to the motivations for human behaviour without them, any argument that purports to disprove beliefs should undergo considerable critical examination.
Structuring the Discussion
With this in mind, there are two clear ways that beliefs can be defended against the charges raised by RS&G, which will serve to shape the essay that follows. The first is to discredit the logic underlying RS&G’s argument by attempting to disprove that Connectionism and our common sense understanding of beliefs are incompatible. This has been attempted by many authors (Egan, 1995; Forster & Saidel, 1994; Stitch & Warfield 1995) with little agreement in method or analysis between their arguments. The second approach confronts RS&G’s position more directly by locating the beliefs which RS&G claim are missing from neural networks (Skokowski, 2009). This second approach, if accepted, reconciles beliefs with connectionist networks at a more fundamental level by proving that connectionist models cannot have ‘killed off beliefs’, if they are in fact contained within them. Before these two critical arguments are presented, a summary of RS&G’s position is given below.
The Connectionist Case Against Belief
Ramsey, Stich and Garon (RS&G, 1990) start with the assumption that our common sense understanding of psychology is a theory, and that the cognitive representational states such as beliefs, desires and so on, are posits of the theory. They argue that this theory is a prime candidate for replacement because it cannot possibly be telling us all there is to know about human behaviour. In its most essential form, RS&G put forward that: according to common sense psychology, propositional attitudes are characterised by a cluster of three features called ‘propositional modularity’. Propositional modularity holds that propositional attitudes such as beliefs are 1) functionally discrete, 2), semantically interpretable, and 3), play a causal role (in mental and behavioural output). The problem is that when the kind of connectionist networks which would be expected to contain propositional attitude-like states are examined; no such features are found. Therefore, if future evidence and theory supports the principles of connectionism, RS&G argue, we will be forced to conclude that propositional attitudes do not exist.
Illustrating the Argument: Networks A and B
Following from this, RS&G provide two models which serve to illustrate their argument. Network A is trained up using back propagation to ‘judge’ the truth or falsity of sixteen propositions such as Cats havepaws, Fish have scales etc. This network is a three-tiered feed forward network with sixteen input nodes, four hidden units and one output node on a set of sixteen distinct input strings (representing the propositions). RS&G interpret an output higher than 0.9 as a positive response (‘true’) and an output lower than 0.1 as a negative response (‘false’).
RS&G propose that when a second network, Network B, with exactly the same structure of units was trained on a set of inputs that included just one additional encoded proposition (Fish have eggs) they obtained a completely different set of weights and biases. Therefore, although Networks A and B produce the same output, there is no functionally discrete property contained within the networks that could produce that output, because ‘‘these networks have no projectable features in common that are describable in the language of connectionist theory’’ (Ramsey, Stitch & Rumelhart, 1991, p 213). Thus, if beliefs or other folk psychological concepts are responsible for our behaviour, why do they not appear to exist within connectionist models? RS&G argue that furthermore, one could construct many other networks that store the same information (i.e. that dogs have fur), and these would be as different from each other as Networks A and B are from one other.
Following from this premise, RS&G posit that in contrast with classical models, connectionist networks like Networks A and B have no distinct states or parts that serve to represent particular beliefs, or other propositional contents. They argue that information storage is distributed across the network and is holistic. Thus, any particular unit or weight value can encode information about many different contents. Since connectionist networks lack modular propositional states, they will not have the discrete features required to make them fall under psychological generalisations.
To summarise, Networks A and B were designed by RS&G to model one of the capacities in which it would be natural to find belief-like states i.e. the capacity to judge whether one or another of a specific group of propositions is true or false. Executing this capacity, they argue, would involve beliefs if they existed. This is because our common sense account of how we are able to judge the truth or falsity of a presented proposition would be that: we judge proposition X to be true if we believe X, and we judge X to be false if we believe it is not the case X. If Networks A and B can therefore execute this function, but there is no tangible property in either Network that can be considered to be a ‘belief’, then if connectionist models are a true, accurate reflection of our cognitive architecture, then Connectionism has indeed ‘killed off beliefs’.
Critical Argument One: Discrediting RS&G’s Logic
There are many critics of the logic used by RS&G, though few agreements exist in the reasoning behind these arguments. Egan (1995), Stitch and Warfield (1995) and Von Eckardt (2005) all take issue with the overall conclusions made, but have wildly different approaches in where the disagreement lies and which arguments to take issue with. This section will provide as an example the criticism of Clark (1995), which provides both a strong contention against RS&G and exemplifies the disagreements in tackling the eliminativist position.
A Higher Level of Description?
Clark’s (1995) critique of RS&G constitutes a rejection of one of their central positions. Clark argues that just because Networks A and B differ significantly in their weights and biases, it does not follow that all models producing the same output can have no common functional parts within them which could be defined as a belief. Instead, Clark claims there is a “higher level of description”, that will unify what seems “at the units-and-weights level, to be a chaotic disjunction of networks” (p.347, 1995). To illustrate this point, Clark gives the example of Sejnowski and Rosenberg’s (1987) post-hoc cluster analysis of NETtalk, a network trained to negotiate text-to-speech transformations. The crucial finding from this research being that, although separate training runs led to different versions of NETtalk that had very different descriptions at the units and weights level, all these versions of NETtalk yielded almost identical clustering profiles when subjected to post-hoc statistical analyses. The conclusion made by Clark is that “There may be higher-level descriptions which are both scientifically well grounded and which capture commonalities between networks which are invisible at the units-and-weights level of analysis” (p.347, 1995).
However, many other theorists who are also opposed to RS&G’s arguments have disagreed with Clark’s (1995) analysis (e.g. Egan, 1995; Stitch & Warfield, 1995). They argue that the logic does not follow even within the example of NETtalk, and even less so when generalised to all connectionist models, as Clark attempts. One prominent criticism comes from Stitch and Warfield (1995) who argue that on the basis of Clark’s analysis, it might be plausible to conjecture that all networks that follow a NETtalk-like-structure will yield similar clustering profiles. However, it is almost certainly true that one could build many NETtalk models which do not conform to or even resemble the original NETtalk model’s structure. Such that it is possible to build a model which produces the same output as NETtalk i.e. accurate text-to-speech transformations, with 80 hidden units (as in Sejnowski and Rosenberg’s 1987 model), or with 8000 units. This is because when training up using back-propagation, systems with hidden units to spare tend to find strategies very different from those invoked by systems with fewer hidden units (Sanger, 1989). Thus Stitch and Warfield (1995) argue there is no evidence that all NETtalk networks, whatever the structure, will exhibit much the same clustering profile, undermining the foundation for Clark’s argument for a ‘higher level of description’.
Despite many differences in the opinions of those arguing against RS&G, there exists some common ground. A frequent critique in this area is that RS&G’s analysis misinterprets the construct ‘belief’, with adverse consequences for their overall position (e.g Bickle, 1993; Egan, 1995). Botterill (1994) makes just such a point, asserting that a distinction between an active belief and a belief as a mere disposition is needed here. He argues further that once this distinction is made, the alleged incompatibility which RS&G purport between the way connectionist models function and the way folk psychology represent us functioning, simply disappear. This distinction is based on the argument that beliefs can persist as dispositions over long periods of time, but while they are dispositions they are mere potentialities to think certain occurrent thoughts. This means that beliefs-as-dispositions are only dispositions to enter states that are functionally discrete. Therefore, if belief is accepted to conform to this definition, then it undermines RS&G’s argument that beliefs must be ‘propositionally modular’, as this requires that all beliefs are functionally discrete all of the time.
This disagreement over what constitutes a belief is a problem, both for RS&G’s position but also for those who oppose it. It weakens the arguments on both sides because a foundation is lacking in how the topic can be approached. If every criticism can be met by a new interpretation of belief then there is little hope for progress in the field. As this topic is investigated by both philosophers and cognitive scientists, it may seem likely that this is where the disassociation lies, but Von Eckardt (2005) illustrates that there appears to be genuine confusion between researchers both across and within disciplines over what should constitute ‘belief’.
Critical Argument Two: Finding Beliefs in Connectionist Models
In the first critical argument, an outline was presented of attempts to discredit the position of RS&G. This highlighted that little agreement exists between researchers and that the one consistency in argumentation, that of questioning the nature of belief, further undermines the position. A superior logical argument which would discredit RS&G at a more fundamental level would be to find the beliefs within the network that appear so elusive. Many researchers have claimed to have found beliefs in neural networks (e.g. Forster and Seidel, 1994; O’Brien, 1993), but let us take the most recent of these (Skokowski, 2009), and by explaining Skowowski’s argument in some detail, many of the prevailing issues in making such a claim will become apparent.
Skokowski’s (2009) Case for Locating Beliefs
Skokowski (2009) argues that ‘beliefs’ in connectionist networks are made up of the properties which he calls I, H and W. I refers to a pattern created on the input units when a stimulus is presented to the model. Skokowski makes the case that, if connectionist models are going to explain cognitive phenomenon, then the input units are crucial as they reflect the senses and their wiring in the brain. H refers to the activation pattern of the hidden units, after learning has taken place in the network. This property is differentiated from the learned final weights configuration which he calls W.
If we return to RS&G’s original argument, they stated that beliefs must be 1) functionally discrete, 2) semantically interpretable, and 3), play a causal role (in mental and behavioural output). Other claims for the location of beliefs have been unsuccessful because whichever substance in the model is asserted to represent beliefs, does not adequately conform to these key principles (Von Eckardt, 2005). In fact, RS&G pre-empt attempts to use the hidden units (H) or the weights (W) as belief states. They argue that neither of the states can be considered to be beliefs or memories because activation pattern H is transient, and beliefs are supposed to be enduring, and the weight structure W is not acceptable because weights do not encode content in functionally discrete ways.
Skokowski (2009), in contrast, holds that there can be encodings in the weights (W), citing evidence using Principle Component Analysis (a clustering technique) in language research (Elman, 1991). This is crucial because RS&G accept there may indeed be some system of encoding in the weights that they are unfamiliar with, and, ‘‘Moreover we concede that if such a covert system were discovered, then our argument would be seriously undermined’’ (p. 502). Skokowski uses this to attest that trained networks have the capacity to encode contents both within activations, I and H, and within their weight states (W). This means that in a trained network it is possible to pinpoint discrete states (meeting the first criteria of ‘propositional modularity’), which Skokowski call beliefs, that, in conjunction with the trained weight state W, are semantically interpretable (second criteria) and play a causal role in behaviour (third criteria). This reasoning is distinct from that of previous theories as it interprets beliefs to be a combination of elements in the connectionist model (I, H and W) and it also uses this to make a clear dissociation between occurrent and enduring beliefs. Skokowski concludes that “It appears we have indeed found an excellent candidate for belief in networks.” (p.468, 2009).
Have we really found beliefs?
However, several general observations make it difficult to accept that beliefs have actually been established by Skokowski (2009). The arguments provided do not let us locate a specific section of the connectionist model where beliefs are located. In fact, by using multiple elements and their interactions, Skokowski’s case may be considered closer to the ‘higher order’ arguments of Clark (1995) described earlier, which have already been criticised for failing to locate a single functioning area (Stitch & Warfield, 1995). This problem, of not locating a direct construct of belief in the networks, is further compounded by providing no evidence for this position. There is not even an illustrative model as to how the interaction between H, I and W takes place, as most other theories provide (e.g. Forster and Seidel, 1994). Furthermore, Skokowski’s theory is premised on a deconstructed notion of belief states, which RS&G and others do not accept, therefore offering little progress to the field while this issue remains unresolved.
The purpose of this paper was to discuss the position “Connectionist models have killed off beliefs”. It has met this purpose by presenting a systematic case against RS&G’s arguments, with attention to both the complexities of particular arguments (Clark, 1995; Skokowski, 2009) along with a broader discussion of the area. In Section 2 an outline of RS&G’s arguments was introduced, which suggested that beliefs do not exist in neural networks because Networks A and B should contain a belief-like state if they exist, but as no substance is shared between them, no such state seems possible. In Section 3, the first counter argument was presented, where the central logic of RS&G’s assertions was questioned. However, a multitude of conflicting views limited this position, which was further undermined by confusion over the nature of beliefs. In Section 4, the second counter argument was advanced, this accepted the logic that if beliefs could not be found in connectionist models, they would not exist, while asserting they have in fact been established (Skokowski, 2009). Yet this most recent addition to the debate still appears flawed, as it uses multiple interactions within the network, provides no evidence for its conclusions and rests on a contested definition of belief.
In conclusion, the sheer variety of arguments against RS&G’s position, as well as the intuitive difficulty in understanding the world otherwise, makes it difficult to accept that connectionist models have ‘killed off beliefs’. However, it is near equally difficult to accept the position that beliefs and other common sense psychological constructs are perfectly congruent with connectionist models. There remains confusion and disagreement between researchers in this area, much of it seems to stem from the idea that belief may not be a single state which can be located in networks (Von Eckardt, 2005). It is surprising that, despite 20 years of research in this field, belief in the context of this debate remains ambiguous. If a concept is so difficult to substantiate, despite different levels of analysis from philosophers and cognitive scientists, does it not suggest that it may be a prime candidate for replacement, just as RS&G suggest? In this sense, the myriad of arguments against RS&G’s position over the years, cumulatively serve perhaps as the best justification for belief and other propositional attitudes not truly existing. Connectionist networks may therefore not have “killed off beliefs”, but the attempted murder by RS&G has clearly focused our minds on the concept’s need for considerable revision.
Bickle, J. (1993). Connectionism, Eliminativism, and the Semantic View of Theories. Erkenntnis, 39, 359-82.
Botterill, G. (1994). Beliefs, functionally discrete states, and connectionist networks: a comment on Ramsey, Stich and Garon. British Journal for thePhilosophy of Science, 45, 899-906.
Churchland, P.M. (1989). Folk Psychology and the Explanation of Human Behaviour. In P. M. Churchland (Eds.), A Neurocomputational Perspective (pp.209-221). Cambridge, MA: MIT Press.
Clark, A. (1995). Connectionist Minds. In C. MacDonald & G. MacDonald (Eds.), Connectionism: Debates on Psychological Explanation (pp.339-356). Oxford: Blackwell.
Elman, J. (1991). Distributed representations, simple recurrent networks and grammatical structure. Machine Learning, 7, 195–224.
Egan, F. (1995) Folk Psychology and Cognitive Architecture. Philosophy of Science, 62, 179-96.
Forster, M. & Seidel, E. (1994). Connectionism and the fate of Folk Psychology: a reply to Ramsey, Stich and Garon. Philosophical Psychology, 7, 437-452.
Marcus, G. F. (2001). The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge, Mass.: MIT Press.
O’Brien, G. (1993). The connectionist vindication of folk psychology. In S. Christensen & D. Turner (Eds.), Folk Psychology and the Philosophy of Mind (pp.108-132). Lawrence: Erlbaum.
Ramsey, W., Stitch, S. P., & Rumelhart, D. E. (Eds.). (1991). Philosophy and connectionist theory. Hisdale, NJ: Erlbaum.
Ramsey, W., Stich, S., & Garon, J. (1990). Connectionism, eliminativism, and the future of folk psychology. In J. Tomberlin (Eds.), Action theory and philosophy of mind philosophical perspectives (pp.499–533). Atascadero, CA: Ridgeview.
Sanger, D. (1989). Contribution analysis: A technique for assigning responsibilities to hidden units in connectionist networks (Technical Report CU-CS-435-89). University of Colorado, Boulder, Department of ComputerScience.
Sejnowski, T. J. & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145-168.
Skokowski, P. (2009). Networks with Attitudes. AI & Society, 10, 461-470.
Stich, S. & Warfield, P. (1995). Reply to Clark and Smolensky: Do Connectionist Minds Have Beliefs?. In C. MacDonald & G. MacDonald (Eds.), Connectionism: Debates on Psychological Explanation (pp.395-411). Oxford: Blackwell,
Von Eckardt, B. (2005). Connectionism and the Propositional Attitudes. In C. Erneling & D.
Johnson (Eds.), The Mind as a Scientific Object: Between Brain and Culture (pp.225-243). New York: Oxford University Press.
Ramsey, W., Stich, S., & Garon, J. (1990). Connectionism, Eliminativism and The Future of Folk Psychology Philosophical Perspectives, 4 DOI: 10.2307/2214202