Bridging the data gap between children and large language models

April 24, 2024
10:00 AM – 11:15 AM
Online

Speaker: Michael C. Frank, Stanford University

TPC Seminar Series

Abstract: While large language models require billions of words of text to show zero shot generalization and in-context learning, children show the same emergent behaviors with just a few million words of language input. What accounts for this difference? I’ll be discussing some of our attempts to measure and understand how language models and multimodal models can be compared productively with children’s learning using datasets and evaluations from developmental psychology.

Bio: Michael C. Frank is Benjamin Scott Crocker Professor of Human Biology in the Department of Psychology at Stanford University and Director of the Symbolic Systems Program. He received his PhD from MIT in Brain and Cognitive Sciences. He studies children’s language learning and development, with a focus on the use of large-scale datasets to understand the variability and consistency of learning across cultures. He is a founder of the ManyBabies Consortium, and has led open-data projects including Wordbank and the ongoing LEVANTE project.