Computational Modelling of Reading and Dyslexia – Symbolic vs. Connectionist Approaches


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/benmupsz/public_html/criticalscience.com/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

 

Man Reading Computer/Book
Reading is a new skill in terms of evolutionary history, and it is therefore unlikely that sufficient time has passed for any adaptive benefits to become coded in the human genotype. Reading therefore represents a novel skill to be learnt, presumably in the absence of some inherited predisposition to acquire the necessary specific skills. Thus it is of particular interest to ascertain how humans read, given that a number of systems are involved; for example, when reading just a single letter, one must recognise the visual symbol, match it to a previously stored representation of the letter, retrieve the name of the letter, its sound (phoneme), and be able to integrate these, not only on a letter-level, but also on a word level. Additionally, as one becomes a more skilled reader, one has to develop the ability to recognise words on the whole (sight) level, as well as develop a system for decoding non-words, by means of mapping symbols to their sounds. This enables the reader to develop a method for dealing with regular words (e.g. gave/pave, hint/mint) as well as irregular words (e.g. pint, have), a skill particularly essential for less regular languages. The regularity of a language refers to the complexity of the rules regarding how the visual patterns of words (orthography) are converted into their sounds (phonemes), with some languages (e.g. German and Italian) having more transparent (shallow) orthographies, compared to English, which is quasi-regular (Plaut, McClelland, Seidenberg & Patterson, 1996). Given the complexity of the reading process discussed above, it is of interest to psychologists and neuroscientists to develop methods to examine how reading  processes (both skilled and abnormal) emerge.

Computational modeling is one such method that, when combined with behavioural and brain data, can be used for the development and testing of theories of reading in ways that arguably are difficult to conceptually achieve or practically accomplish. This is particularly important for mainly unconscious processes such as reading, as it can be difficult for researchers to transcend their intuitions regarding the hypothesised mental operations occurring. A prime example of this is the posited existence of a mental lexicon (a dictionary of words that an individual knows) that is amended and consulted during learning and reading. Until the emergence of a particular form of modeling, called connectionist modelling, no plausible alternative explanations existed; however, effective models of reading have been developed that do not rely on a lexicon. Computational modelling also affords researchers the possibility of going beyond correlational data to test causal hypotheses; in the case of reading by creating impaired models of reading that test the potential causal roles of correlated deficits (Seidenberg, in press).

Computational modelling is a tool for the development and testing of cognitive and behavioural theories and comes in two forms, symbolic and connectionist. Symbolic models are computationally encoded versions of box-like cognitive theories involving inputs, processes, and outputs. Their development requires the full definition of a theory, thus casting light on unspecified or incomplete parts. These models can produce simulations from theory allowing comparison between models’ behavior and empirical data. This permits assessment of the degree of relation between theoretical models and human information processing, leading to a theory being refined to better encapsulate the empirical data. However, even when a model does accurately represent the real-world data, and is sufficiently specified, it does not meant that the underlying theory is ‘true’, but until a theory and associated model with better explanatory power is developed, this is the best that can be achieved with a scientific approach (Bechtel & Abrahamsen, 2002). Symbolic models are in their essence functionalist – that is to say that they seek to make generalisations about behaviour, but not particularly in reference to the underlying neurophysiological mechanisms. Although this approach has its roots in the paucity of neurobiological knowledge when such models were originally developed, these models become increasing untenable as our understanding of brain-behaviour links develops.

Connectionist models differ from symbolic models in several ways; firstly they make no a priori assumptions about how information is represented or internally processed. They are biologically inspired, based on neural network activity, and learn through the process of backpropagation during training. This occurs through the process of adjusting the connection weights of hidden units, which encode distributed representations of the training data, and mediate between the input and output layers. Therefore, through training, patterns of activation are encoded that represent the model’s ‘knowledge’ (this being an emergent property of the network). In this way connectionism models how real neuronal activity might give rise to a functional reading network through exposure to relevant stimuli. Importantly, such models reference a wider approach to understanding how skills are acquired, implicating more general brain mechanisms, neuroplasticity, and statistical learning. Therefore, inferences drawn from the efficacy of any connectionist model have implications beyond that of the specific phenomena being studied. Secondly, as opposed to symbolic models, where representations are stored locally, in connectionist models, representations are mostly distributed, with multiple units of ‘knowledge’ (for example in the case of reading, those representing orthography, phonology, semantics, and context) being activated in response to a particular input (McLeod, Plunkett & Rolls, 1998).

A good model of reading should fit the empirical data for normal reading as well as be able to accurately account for abnormal reading processes observed in the real world. Additionally, a model should show how this behavior arose from the training data, thus fitting a normal developmental trajectory; however, this is only possible in a connectionist model where learning occurs and can be observed, as opposed to a hard-wired symbolic model where variables and thresholds are predefined.  Reasons for selecting one modeling technique over another may be determined or constrained by the nature and scope of the research question, or the philosophical allegiances of the researchers. Overall, however, computational modeling has promise as a testing platform for pitting competing theories against each other (Coltheart, Rastle, Perry, Langdon & Ziegler, 2001).

Two main computational models of reading have arisen from the symbolic vs. connectionist debate: the Dual Route Cascade (DRC) model and the Parallel Distributed Processing (PDP) models, respectively. While both models appear to have good explanatory power, a deeper examination reveals that although the DRC is able to account for a whole slew of reading phenomena, the means by which this is achieved (by hard-wiring the model and fitting it to discrete datasets) vastly reduces its validity. In comparison, the PDP family of models, while not able to explain as broad a collection of phenomena, does so in a more plausible way, based on more general assumptions about how brain-based statistical learning occurs.


Dual Route Cascade (DRC) model

A Dual Route Cascade (DRC) model

The DRC (Coltheart, Curtis, Atkins & Haller, 1993), a symbolic model, makes two main assumptions about reading; firstly that there are two information processing routes and secondly that information processing is cascaded in nature. It was developed based on findings indicating that correct pronunciation is predicted by the regularity and frequency of words (Waters & Seidenberg,1985), the quasi-regular structure of English, and the Orthographic Depth Hypothesis (Katz & Feldman, 1981; Frost, Katz & Bentin, 1987) which suggests that two information processing routes exist which can both be used for orthography-to-phonology coding as well as word recognition, depending on the relative difficulty/transparency of the orthography in question. Specifically, the DRC posits the roles of these pathways to be a grapheme-to-phoneme route (GTP; non-lexical) for regular/new words and a lexical semantic route for irregular words. The cascaded component of the DRC is based on McClelland & Rumelhart’s (1981) Interactive Activation and Competition (IAC) model, an early cascaded model of reading, which accurately reproduces real-life data vis-à-vis context effects. A cascaded model will pass activations onto later modules after any amount of activation, as opposed to threshold processing, which has been argued is less suitable for modeling of reading. The DRC builds on the IAC’s feature and letter-level strengths, and incorporates it as a nested component its model (Coltheart et al., 2001). It uses two lexicons, an orthographic and a phonological and has no semantics defined. However, 31 parameters which govern excitation and inhibition within the model are defined a priori, based on a somewhat brute-force approach to find values which accommodate the most difficult classes of words, namely exception words and non-words.

In addition to these parameters being predefined, the DRC model’s internal structure is hard-wired to reflect the box-model architecture on which its theory is based (see Coltheart et al., 1993; 2001). The decision to hard-wire the DRC model raises a number of issues, the most prominent of which is its biological plausibility, namely its inability to learn from a random starting state, arguably rendering it developmentally invalid. Coltheart et al. (2001) proffers several arguments in support of hard-wiring, claiming that if reading either uses localist or two or more levels of representations, backpropagation will not reveal its structure due to its distributed nature. However, these are rather self-supporting arguments, based on no more than conjecture.

Proponents claim that the DRC’s broad range of accurate performance makes it the model of choice. Although these results are seemingly unrivaled (for full list see Coltheart et al., 2001; pp. 251), the assumption that a model is better simply because it fits a larger number of criteria should be questioned. Indeed, the accuracy of the DRC is largely dependent on data from single experiments, such that if the well-fitting model was applied to a dataset from a different experiment, its explanatory power is greatly reduced (Seidenberg & Plaut, 2006). While tailoring the DRC to a particular dataset is tempting, it leads to overfitting, leaving it open to criticisms of its generalisability along with the lack of a learning process leading to such impressive performance.

 

Parallel Distributed Processing (PDP) models

A Parallel Distributed Processing (PDP) 'triangle' model

A series of connectionist models, PDP models aim to provide a “single-route” model of reading, based on lexical knowledge, not rules. Due to its ability to learn/self-organise, PDP models provide an opportunity to model both normal and abnormal reading processes and arguably offers a more parsimonious explanation of reading development. A number of PDP models have emerged since Seidenberg & McClelland’s (1989) influential SM89 model;  what follows is a précis of this and subsequent models, followed by a discussion of the theoretical strengths and limitations of current PDP models.

The SM89 is a simplified model, which did not simulate gradual buildup of activation (instead preferring a single-sweep method), limited input to monosyllabic words, used frequency compression for occurrences of words, and used a phoneme encoding scheme (triples) with better local sensitivity. When tested with non-word data from earlier studies, the SM89 demonstrated some inadequacies, later replicated by Besner, Twilley, McCann & Seergobin (1990). Seidenberg & McClelland (1990) explained this poor performance by the relatively small amount of training data along with limitations imposed by the phonological representational system used (triples), which represented letters in terms of adjacent elements, rather than relative or absolute position. This had the effect of dispersing knowledge about contexts in ways that meant the model could not generalise effectively. However, the amount of training data is arguably not a valid issue, as the DRC learnt non-words on a similarly sized corpus of training data; design issues are more likely the cause.

The majority of connectionist models which have superseded SM89 use a ‘triangle’ model, referring to the three components of the model – the orthographic input, phonological output, and a semantic component. The addition of the semantic component underlies the assumption that word recognition relies on a combination of phonological and semantic information. The three components are linked in differing ways; in earlier models everything is marshaled by hidden layers, in later ones there are also direct links from orthographic-to-phonological/orthographic-to-semantic). Additionally, information flow from orthographic-to-hidden layer is restricted to one way, such that phonological output cannot affect orthographic representation.

Plaut et al. (1996) built on the SM89 model, producing the PMSP96 triangle model, which removed SM89’s phonological encoding system, instead using a system that condensed regularities between phonological and orthographic representations. Although the PMSP96 performed more accurately on non-words, the implementation of its phonological and orthographic representational system in a number of the simulations was specifically defined; they were separated to take into account constraints as opposed to emerging from the data, thus taking away from the biological plausibility that a connectionist approach brings to reading.

The issue of a priori assumptions and therefore a lack of biological plausibility in the PMSP96 model was addressed by Harm & Seidenberg (1999) in their HS99 model by using attractor networks, which are units that “…interact and update their states repeatedly in such a way that the initial pattern of activity generated by an input gradually settles to the nearest attractor pattern”. These attractors were deployed for phonological representations, and demonstrated distinct advantages over previous connectionist models. More recent models (e.g. Harm & Seidenberg, 2004) have focused on word semantics, namely how the orthography directly and indirectly (via the phonology) influences understanding of meaning.

While the evolution of PDP is impressive, important questions remain regarding its foundations, namely the plausibility of backpropagation and the learning process. Does a similar mechanism occur on a neurophysiological level? Seidenberg (2006) argues that while the algorithm may accurately capture some brain-based processes, it does so on a level that is removed from the neurophysiology. Additionally, the backpropagation process by which PDP models ‘learn’ lacks ecological validity in terms of one of its key assumptions – that learning can only occur when a pronounced response is corrected by an imagined teacher. Germane to this is the observation that real-life feedback is not consistently provided (and is sometimes inaccurate) arguably leading to more generalisable learning (as opposed to the word-specific knowledge that may result from consistent and correct feedback). Coltheart (2006b) echoes these concerns, drawing attention to the high number of learning trials connectionist models need to correctly pronounce words. He cites catastrophic forgetting (the process whereby the learning of a latter set of data causes connectionist models to ‘forget’ the former) as an objection to a simpler model of learning, where children sequentially/progressively learn small word-sets; note however that Coltheart conveniently omits mentioning interleaved learning (where datasets are learnt in a parallel/overlapping manner). Additionally, children not only have their pronunciation corrected, but are given additional feedback which computational models do not receive (e.g. sounds/names of letters and pronunciation of letter groups). Their learning also differs in that they are able to both comprehend and produce speech, meaning that feedback-based learning may occur, with produced speech being adjusted to match previously correctly pronounced words. Despite these discrepancies, connectionist models clearly learn well, even if backpropagation does not precisely replicate natural learning.

 

Dyslexia

Thus far only normal reading processes have been discussed, but in evaluating the efficacy of a model of reading, one must also examine its ability to account for abnormal reading processes. The most common of these abnormal processes is dyslexia, an unexpected impaired performance in reading, spelling, and decoding of written language, given no other perceptual or cognitive deficits (International Dyslexia Association, 2007). It takes two main forms, surface and phonological; in surface dyslexia, readers have problems at the orthographic level, which impairs their performance on irregular words. For example, when a person with surface dyslexia encounters an irregular word such as ‘island’, they will tend to over-regularise words, leading them to over-rely on a strategy where they sound out the word (sub-lexical), meaning they are likely to pronounce it IZ-land (Manis, Seidenberg, Doi, McBride-Chang & Petersen, 1996). In phonological dyslexia, readers have problems at the phonological level, which impairs their performance on unfamiliar and non-words. However, their reading is relatively fluent, as long as the words have previously been learnt.

The DRC model can theoretically explain both surface and phonological dyslexia. According to this model, in surface dyslexia, damage to the lexical route results in all words being processed by the grapheme-to-phoneme route, resulting in irregular words (e.g. pint and mint) being mispronounced. Once an accurate representation of normal reading has been achieved, these hypotheses can be tested by selective ‘lesioning’ of parts of the DRC model. There are two main ways of simulating these sub-types using the DRC; firstly, by entirely ‘lesioning’ the corresponding route, which produces extreme dyslexia of the form hypothesised above. However, this is poor modelling because dyslexia is on a continuum and parts of the model’s reading abilities which should not be detrimentally affected are. The second far more plausible method involves finer adjustment of the responsiveness of units and therefore was used to attempt the simulation of both sub-types of dyslexia mentioned. In the case of surface dyslexia, the orthographic lexicon was lesioned, and the phoneme activation was adjusted to reflect the speed that surface dyslexics are tested/perform at. In the case of phonological dyslexia the time the model took as it scanned each word, letter-by-letter, was increased; both these approaches yielded results in line with real-life data.

There have been similar successes using PDP – one simulation (PMSP96) explicitly attempted to model acquired surface dyslexia by creating a feed-forward model with localised input and output representations and an additional phonological input, such that the model’s reliance on input phonology for correctly reading irregular words grew as the training period increased (Plaut et al., 1996). Although the simulation was somewhat successful in replicating data from a number of cases, it was dependent on the idea that damage to the semantic system causes surface dyslexia, an idea refuted both by neuropsychological evidence and general implications of the claim (i.e. that all those suffering semantic impairments would display surface dyslexia). However, later models have successfully modeled developmental surface and phonological dyslexia, by manipulating hidden layer quantities/behaviours and phonological cleanup units respectively (Coltheart, 2006a).

Although both symbolic and connectionist accounts of dyslexia have demonstrated some explanatory power, suggesting phonological deficits play one of the causal roles in dyslexia, the data from the DRC (or indeed any symbolic model) should be questioned due to its lack of a developmental learning process.


Closing remarks

We have examined symbolic and connectionist approaches to modelling reading and dyslexia, and discussed their relative merits in terms of how well their simulations account for the empirical and case-series data. The a priori nature of the DRC and its implied structure raises questions regarding its cognitive evolution and biological cost, casting doubt on its existence in its current, complex form. Overall, connectionist accounts are more biologically plausible, provide insight into developmental trajectories of reading, and perform as well as skilled readers in many ways.


References

Bechtel, W. & Abrahamsen, A. (2002). Connectionism and the mind: Parallel processing, dynamics, and evolution in networks. Second Edition. Oxford: Blackwell.

Besner, D., Twilley, L., McCann, R. S., & Seergobin, K. (1990). On the connection between connectionism and data: Are a few words necessary? Psychological Review, 97, 432-446.

Coltheart, M. (2006a). Acquired dyslexias and the computational modelling of reading, Cognitive Neuropsychology, 23(1), 96-109.

Coltheart, M. (2006b). Dual route and connectionist models of reading: an overview. London Review of Education, 4(1), 5-17.

Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed processing approaches. Psychological Review, 100(4), 589-608.

Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108(1), 204-256.

Fodor, J., & Pylyshyn, Z. (1988). Connectionism and Cognitive Architecture: a Critical Analysis, Cognition, 28, 3-71.

Frost, R., Katz, L., & Bentin, S. (1987). Strategies for visual word recognition and orthographic depth: A multilingual comparison. Journal of Experimental Psychology: Human Perception and Performance, 13, 104–115.

Harm, M. W. & Seidenberg, M. S. (1999). Phonology, reading acquisition, and dyslexia: Insights from connectionist models. Psychological Review, 106, 491-528.

Harm, M. W. & Seidenberg, M. S. (2004). Computing the meaning of words in reading: Division of labor between visual and phonological processes. Psychological Review, 111, 662-720.

International Dyslexia Association (2007). Frequently Asked Questions about Dyslexia.  Retrieved 25th March, 2010 from the World Wide Web: http://www.interdys.org/FAQ.htm

Katz, L. & Feldman, L. B. (1981). Linguistic coding in word recognition: Comparisons between a deep and a shallow orthography. In A. M. Lesgold & C. A. Perfetti (Eds.) Interactive processes in reading (p. 85–106). Hillsdale, NJ: Erlbaum.

Manis, F. R., Seidenberg, M. S., Doi, L. M., McBride-Chang, C., & Petersen, A. (1996). On the bases of two subtypes of development dyslexia. Cognition, 58, 157-195.

McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.

McLeod, P., Plunkett, K. & Rolls, E.T. (1998). Introduction to Connectionist Modelling of Cognitive Processes. Oxford: Oxford University Press.

Park, K., Lyu, K., Yu, W. & Lim, H. (2007). A Computational Model of Korean Lexical Decision task and its Comparative Analysis by Using Connectionist Model. Third International Conference on Natural Computation, IEEE.

Patterson, K. (1990). Alexia and neural nets. Japanese Journal of Neuropsychology , 6, 90-99.

Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. E. (1996). Understanding normal and impaired reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56-115.

Seidenberg, M. S. & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568.

Seidenberg, M. S., & McClelland, J. L. (1990). More words but still no lexicon: Reply to Besner et al. (1990). Psychological Review, 97, 447-452.

Seidenberg, M. S. (2006). Connectionist Models of Reading In Gareth Gaskell. (Ed.), The Oxford Handbook of Psycholinguistics. Oxford University Press.

Seidenberg, M. S. (in press). Computational models of reading: Connectionist and dual-route approaches In M. Spivey., K. McRae., & M. Joanisse. (Eds.), Cambridge Handbook of Psycholinguistics. Cambridge University Press.

Seidenberg, M. S. & Plaut, D. C. (2006). Progress in understanding word reading: Data fitting versus theory building. In S. Andrews (Ed.), From Inkmarks to Ideas: Current Issues in Lexical Processing. Hove, UK: Psychology Press.

Waters, G. S. & Seidenberg, M. S. (1985). Spelling-sound effects in reading: Time-course and decision criteria. Memory and Cognition, 13, 557-572.

This entry was posted in Psychology and tagged , , . Bookmark the permalink.