Emin Orhan: Learning Through the Eyes of a Child

This entry is a part of the NYU Center for Data Science blog’s recurring guest editorial series. Emin Orhan is a CDS Post Doctoral Associate.

“What a piece of work is a man! How noble in reason! How infinite in faculty! In form and moving how express and admirable! In action how like an angel! In apprehension how like a god! The beauty of the world! The paragon of animals! And yet, to me, what is this quintessence of dust? Man delights me not.” — Hamlet, Act II, Scene 2

In his famous “what a piece of work is a man” monologue, Hamlet expresses the two extreme feelings one can have toward human nature: on the one hand, we seem to be able to achieve the most astonishing feats, unparalleled in the animal kingdom; on the other hand, we are merely dust. This poignant tension between “we are special” and “we are not special” runs through the entire human intellectual history, including the sciences. In psychology, this tension is most directly felt in the famous nature vs. nurture debate regarding the origin of our minds. Simplifying somewhat, the nature side in this debate claims that we are born with special, unique capacities hard-wired into our brains that allow us to become “the paragon of animals”, whereas the nurture side claims that our newborn brains, although powerful, are nothing but generic (not special) learning machines (dust) that instead owe most of their magic to the richness and abundance of the sensory data we constantly receive in our interaction with the world.

Perhaps unlike Hamlet’s existential plight, the nature vs. nurture question is fortunately a thoroughly empirical issue that we can one day hope to resolve through experimental methods. How so? The basic idea would be to start with a powerful, but generic learning machine that we know not to contain any special mechanisms. We would then drop this learning machine into the same kind of environment that a newborn baby would find itself in and subject it to the same kind and amount of sensory data that the baby would receive during the course of its development. Would our generic learning machine then grow up to be a paragon too, just like the human baby? That is the question. If so, the nurture side wins; if not, the nature side wins. It’s that simple! Of course, in this ludicrously ambitious form, this experiment still resides squarely within the sci-fi genre due to (to put it mildly) the considerable difficulties involved in “dropping” a learning machine into the “same kind of environment” as a newborn baby.

But no matter! We can scale back our ambitions and ask the same type of nature vs. nurture questions for smaller sections of our minds. Take the development of visual concepts. How do babies learn basic visual categories, like table, chair, cat, dog, car etc.? Is it possible to learn these concepts just by taking in visual data of the type babies receive while they’re growing up and then running some extremely generic (not special) learning mechanism on that data? Or does learning these concepts require more intricate, special, hard-wired mechanisms?

This is precisely the question we asked ourselves in a recent work with my colleagues. To address this, we used a large, longitudinal video dataset (called SAYCam) collected by Jessica Sullivan and colleagues from head-mounted cameras on three young children during the course of a 2.5 year period in their early development (6–32 months). We then applied state-of-the-art generic self-supervised learning algorithms to this longitudinal video dataset and analyzed what exactly we were able to learn with these generic learning algorithms without assuming any intricate hard-wired mechanisms. In the end, our generic self-supervised learning models successfully learned many basic visual categories that would be behaviorally relevant for a young child, like table, chair, crib, door, window, car, cat (see this short video clip for an example demonstrating how one of our models recognizes cats). In ongoing work, we’re very interested in understanding how far we can push these generic learning mechanisms. For example, can we learn the basic principles of what developmental psychologists call “intuitive physics”, i.e. understanding of basic physical principles like object permanence (that objects don’t pop in or pop out of existence) or shape constancy (that objects tend to maintain their shapes), in this generic way? Or does the learning of these principles require special innate machinery, as many developmental psychologists have argued before?

By Emin Orhan