Language models: A new perspective on language and cognition

Editorial Assistants: Matilde Tassinari and Zoey Chapman.

Note: An earlier version of this article has been published in the Dutch version of In-Mind.

How do computers help us understand language acquisition? What do ChatGPT and text readability scores have in common? Language models are no longer just a useful technology; they are a window into our own linguistic abilities. Discover how these systems not only generate text but also challenge and deepen insights from the psychology of language. 

Figure 1.

Language models are a remarkable technology that affects almost every sector. Scientific research is no exception. They are a powerful tool for language research and are used to study reading behaviour, determine text difficulty and even challenge old theories of language acquisition. What does ChatGPT have to do with reading levels? Are we born with our ability for language or is it learned? How do computers manage to imitate human language use? Talking to robots, what once seemed like science fiction has now become a source of new insights into how we acquire, process and understand language. 

Language models and their associated applications, think of ChatGPT or Microsoft Copilot, serve both as a source of entertainment and as a tool for handling routine tasks. Who hasn’t drafted an email with the help of a language model? More surprisingly, these systems have also become an important tool in the scientific study of language and reading development.  

Anyone who has ever learned a second language can attest to this: language is immensely complex. This makes it even more astonishing that computer models have begun to produce comprehensive and (seemingly) high-quality answers to a wide range of questions. Need a quick summary of the history of New York? You will have your answer in two seconds. Want a poem about frozen lasagna in the style of Shakespeare? No problem! With ease, language models generate poems that makes it sound as if the classic writer had a fondness for fine dining. To the public, this result seemed to come out of nowhere, but scientists from various disciplines had been experimenting for a long time with methods to simulate language production using computer programs. 

How to build a language model

Building a successful language model essentially boils down to deciphering the recurring patterns in our language. There are many such patterns; one example is the order in which the subject, object and verb appear in a sentence. In English, this is typically subject (S) – verb (V) – object (O). For example, Mark (S) drinks (V) a coffee (O). In Japanese, the order of verb and object would be reversed: Mark (S) a coffee (O) drinks (V). To place words in the correct order, we must of course correctly deduce the roles (subject, object, verb) that each word can play. This roughly boils down to correctly assessing the meaning of that word. That is a much harder nut to crack. 

The answer to this question lies in our language itself. As linguist John Firth put it: ‘you shall know a word by the company it keeps’ [1]. The idea is that words expressing similar concepts tend to occur together more often in language. For example, there are far more texts that contain both the words “leopard” and “antelope” than there are texts that include “leopard” and “goldfish.” A leopard and an antelope are both mammals living on the African savanna, whereas a goldfish and a leopard share little more than the fact that they are animals. If we extend this exercise to all words using a huge amount of text, we get an estimate of the relative meaning of the words. You can think of the meanings of words as locations in a word cloud, words more related in meaning are located closely together. This word cloud will form the basis for the next step. 

The aim is to now use this word cloud to deduce how the different words work together to form a coherent message. For a language model to produce human-like text, the model must learn what constitutes a plausible continuation of a sentence. This is where recent language models (particularly transformer models) have made enormous progress. Using modern machine learning techniques and vast amounts of text (billions of words), a language model learns to predict what word is likely to come next in a sentence. It is this new class of transformer models that is typically referred to as large language models. By tracing the connections within the word cloud, the model learns which paths and associations occur naturally in our language and are therefore “permitted”. In this way, the word cloud is transformed from a rough map showing how concepts are generally related into a detailed city plan that allows us to map out complex routes. These routes then combine the words in the cloud into meaningful messages.  

Obtaining such a language-wide “GPS system” does not come cheaply: multiple servers process massive amounts of text continuously for weeks. A child does better in terms of cost efficiency. By the age of thirteen, a child has been exposed to roughly 100 million words. An average language model, by contrast, is typically trained on three to four times as many words, with some outliers processing up to 15 trillion words (e.g., Llama-3.1-405B) [2]. Still, the resulting models can produce text that is often indistinguishable from human writing. In an experiment tailored after the original Turing test, researchers investigated whether the newest language models can already make us believe we are talking to a fellow human [3]. The setup was simple: participants entered a chat with either another person or a language model. They could ask their conversation partner any question they wanted, and afterward they had to guess whether they had been speaking to a human or a machine. In 54% of cases, GPT-4 was mistaken for a human, while real people were correctly identified as such in 67% of cases, making them almost indistinguishable. Language models are therefore eagerly used by psychologists and linguists to gain more insight into language acquisition and language use. 

Can a language model predict words like people do? 

Even though some models have managed to pass the Turing test, language models and humans are profoundly different. Some language models may be able to simulate a similar level of language use as humans, the way in which language models achieve this level is completely different from how our human brain accomplishes the same task. Language models therefore cannot offer direct insight into how humans process language, but they can serve as valuable tools for studying it. 

One of the questions that has preoccupied psycholinguistics for years is which characteristics of a word influence our reading behaviour. Short words are obviously easier to read, but we also process words that we are very familiar with or that occur frequently in our language much faster. You will pause considerably longer on a word like “miraculous” than you would on “mysterious”. An open question within this field of research is whether people also predict the next word while reading. In other words, is our reading behaviour and reading speed partly determined by the predictability of a word? Predicting the next word is precisely what language models are trained to do. A language model can therefore be used to estimate how likely a word is to appear in a given sentence. Take, for example, the sentence “The weather in Belgium is …” A language model estimates that the probability of continuing the sentence with “not” is twice as high as the probability of continuing it with “fantastic”. From its training data, the model has learned that people generally speak negatively about Belgian weather. Whereas language researchers used to rely on human estimates to determine which words are more predictable, a language model now fulfils this task. It allows us to estimate very accurately and on a word-by-word basis the probability of encountering this word in the current context. This opens new possibilities for studying how readers anticipate the next word while reading. 

By calculating the predictability of words in a text using a language model, scientists were able to demonstrate that both our behaviour and our brains are strongly attuned to these patterns in language. The more predictable a word is, the easier it is for us to process. We spend less time thinking about a predictable word, and the brain also anticipates the most likely options for the next word [4], [5], [6], [7]. For adult readers, this finding is not particularly surprising, but it does raise new questions. Does the same principle of word predictability in reading behavior also apply to beginning readers, young children who have not yet developed the same reading fluency? Do children compensate for limited reading skills by relying more heavily on word predictability? Or does the act of decoding individual words demand so much cognitive effort that there is little capacity left to form expectations about the text that follows? 

For children, it is not yet entirely clear which of these explanations best describes their reading process, but there is some evidence supporting the first option. Young, beginning readers seem to rely more on the surrounding context than older, more fluent readers [8]. Younger readers take more time to read a word, but they show a greater relative speed-up when encountering a predictable word. The cause of the acceleration on predictable words seems to differ between children and adults. The advantage in children stems from faster integration of the words with the preceding context, while adults effectively seem to predict words [9]. These and other studies were limited to manipulating the predictability of a single word in a context and often relied on subjective estimates of that predictability. 

In our research, we want to take advantage of the fact that language models enable us to accurately map the predictability of all words in a text on a large scale. This not only gives us a more complete picture of the reading process but also offers opportunities for measuring reading proficiency. If children are indeed sensitive to the predictability of each word in a text, this has important implications for how we measure reading ability or, more precisely, how we determine text difficulty. Most existing formulas for assessing text difficulty are based on the average length of words and sentences [10]. If predictability turns out to play an important role, it can be included in the formulas to determine the difficulty of a text more accurately. This, in turn, opens new doors; for example, we can ask ourselves whether the extent to which a child uses the predictability of a word is also related to the degree of text comprehension and whether we can use this in teaching reading comprehension. It’s an exciting time for research at the intersection of language, cognition, and reading. 

A new tool for psycholinguistic research 

Language models play a significant role in psycholinguistic research because they allow us to capture complex patterns in language with great precision. In this way, they have become a valuable part of the psychologist’s toolkit. For example, we can use them to better determine the difficulty of a text. On the other hand, they also teach us something about how language can be learned. For a long time, it was thought that language is too “unconstrained” to be learned without an innate language faculty in the human brain. The fact that language models can reach such a level of linguistic competence purely based on the co-occurrence of words has led many to re-evaluate the idea of an innate language faculty in humans. These are just two examples of how language models can be used; however, the possibilities extend much further and continue to expand rapidly within various domains of psycholinguistic and cognitive research. ChatGPT is therefore not only great for writing delicate poetry but may also lead to new scientific insights into language and human cognition.

Bibliography 

[1] J. R. Firth, Studies in Linguistic Analysis. Wiley-Blackwell, 1957. 

[2] A. Warstadt et al., ‘Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora’, in Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, Singapore: Association for Computational Linguistics, 2023, pp. 1–6. doi: 10.18653/v1/2023.conll-babylm.1. 

[3] C. R. Jones and B. K. Bergen, ‘People cannot distinguish GPT-4 from a human in a Turing test’, May 09, 2024, arXiv: arXiv:2405.08007. doi: 10.48550/arXiv.2405.08007. 

[4] S. Boeve and L. Bogaerts, ‘A Systematic Evaluation of Dutch Large Language Models’ Surprisal Estimates in Sentence, Paragraph, and Book Reading’, Dec. 20, 2024. doi: 10.31219/osf.io/vqnw6. 

[5] C. Caucheteux and J.-R. King, ‘Brains and algorithms partially converge in natural language processing’, Commun Biol, vol. 5, no. 1, pp. 1–10, Feb. 2022, doi: 10.1038/s42003-022-03036-1. 

[6] A. de Varda, M. Marelli, and S. Amenta, ‘Cloze probability, predictability ratings, and computational estimates for 205 English sentences, aligned with existing EEG and reading time data’, Behav Res, Oct. 2023, doi: 10.3758/s13428-023-02261-8. 

[7] E. G. Wilcox, T. Pimentel, C. Meister, R. Cotterell, and R. P. Levy, ‘Testing the Predictions of Surprisal Theory in 11 Languages’, Transactions of the Association for Computational Linguistics, vol. 11, pp. 1451–1470, Dec. 2023, doi: 10.1162/tacl_a_00612. 

[8] R. L. Johnson, E. C. Oehrlein, and W. L. Roche, ‘Predictability and parafoveal preview effects in the developing reader: Evidence from eye movements’, J Exp Psychol Hum Percept Perform, vol. 44, no. 7, pp. 973–991, July 2018, doi: 10.1037/xhp0000506. 

[9] S. P. Tiffin-Richards and S. Schroeder, ‘Context facilitation in text reading: A study of children’s eye movements’, Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 46, no. 9, pp. 1701–1713, Sept. 2020, doi: 10.1037/XLM0000834. 

[10] H. Pander Maat, S. Kleijn, and S. Frissen, ‘LiNT: een leesbaarheidsformule en een leesbaarheidsinstrument’, Tijdschrift voor Taalbeheersing, vol. 45, no. 1, pp. 2–39, Dec. 2023, doi: 10.5117/TVT2023.3.002.MAAT. 

Pictures

Figure 1: https://pixabay.com/photos/boy-book-reading-literature-read-5731001/  

article author(s)

article keywords

facebook