The document discusses a study that aimed to reconstruct visual experiences from brain activity evoked by natural movies using fMRI. The study:
1. Recorded brain activity from subjects watching hours of movie trailers to build dictionaries linking shapes, edges and motion to brain activity.
2. Tested the dictionaries by recording brain activity to new movie trailers and selecting the clips most similar to the observed activity.
3. Successfully identified the movie stimulus evoking 95% of observed brain activity, far above chance, demonstrating the ability to decode dynamic visual processing from fMRI.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
04.18.13
1. A P R I L G A R D N E R
A P R I L 2 0 1 3
Reconstructing visual experiences
from brain activity evoked by
natural movies
2. Background
Understanding neural decoding of processing early
visual information – color, shape, location – has
been explored in past studies.
Reconstructing still images
fMRI is the tool of choice
has a built-in time lag; the level of oxygen in the blood doesn’t
change unti about 4 seconds after neuron activity
Neural decoding had to account for this in a multi-step process
3. What is the problem area?
How does our visual perception work to process what
we see in everyday life?
Theory: Seeing is like watching a movie – a dynamic
experience.
Hypothesis: Dynamic brain activity of natural image
processing can be decoded with fMRI technology.
The problem is important!
Visual information is dominant in how we receive information
High implications for those that are disabled or dream
5. Question 6
What is neural decoding?
Reconstruction of sensory and other stimuli from
information that has already been encoded and
represented in the brain.
Can we predict what sensory stimuli the subject is
receiving, purely based on action potentials?
7. Author Claims
The first new motion-energy encoding model that is
optimized with fMRI
The model reveals how motion information is represented in
early visual areas
The model provides reconstructions of natural movies from
evoked BOLD signals
8. Methods
Obtain BOLD signals while watching a series of
natural color movies
Fixation task to control eye position
Two separate data sets obtained
TRAINING DATA from 7,200s of each movie, presented once
TEST DATA: BOLD signals from 540s of color natural movies,
each repeated ten times
9. Methods
[1] Record brain activity while the subject watches
several hours of movie trailers.
[2] Build dictionaries to translate between
shapes, edges and motion in the movies and
measured brain activity.
[3] Record brain activity to a new set of movie
trailers that will be used to test the quality of
dictionaries and reconstructions.
[4] Build a new library of ~18,000,000s of video.
Select & average the 100 clips whose activity is most
similar to the observed brain activity.
11. Question 5
What is first order motion perception?
responds to moving luminance patterns
Detected by early simple “motion sensors”
12. Results
Success.
Motion-energy encoding model identified specific movie stimulus that evoked an
observed BOLD signal 95% of the time (464 of 486 volumes), within +/- one volume.
Far above chance (<1%).
15. Data: Consistent with Claims?
Each model was unique to the person
Intersubject variability - how much does the model from
Subject A vary from that of B or C?
Model is perception agnostic.
Does the person see the clips, or actually attend to them? What
would the differing result be if they didn’t?
In this study, measuring early visual areas
Model was not accurate in higher level areas
Does a person who sees hallucinations, register
anything similar in V1, V2, or V3?
Is this a Visual Experience? No.
16. Question 4
What is bottom
up, and top down
processing?
Bottom up
processing of a stimulus
in which information
from a physical stimulus
(rather than from a
general context)
Top down
Knowledge and memory
play a role
Bugelski and Alampay (1961)
18. Question 2
Describe the neural correlates of consciousness
(NCC).
the minimal set of neuronal events and mechanisms sufficient
for a specific conscious percept.
explain the exact relationship between
subjective mental states and brain states,
the nature of the relationship between the conscious mind and the
electro-chemical interactions in the body
Block N (1996) How can we find the neural correlate of consciousness?
Trends Neurosci 19:456–459.
Rock I, Linnet CM, Grant P, Mack A (1992) Perception without attention:
results of a new method. CognitPsychol 24:501–534.
19. Question 1
What is the materialist theory, as applied to
consciousness?
20. Question 1
What is the materialist theory, as applied to
consciousness?
As opposed to the dualist theory – mind is a nonphysical
substance
Mental = physical
All mental states, properties, processes, and operations are
identical to physical ones
Behaviorists maintain that all talk of mental causes stem from
environmental stimuli and behavioral responses
Tononi, Giulio. An information integration theory of consciousness. BMC
Neuroscience 2004, 5:42
21. What’s Next?
Reproduce images of the mind that no one else sees
INTERNAL IMAGERY
Dreams
Hallucinations
Communicate with those who verbally cannot,
combating
Coma
Stroke
Neurodegenerative disease
Notes de l'éditeur
Reconstruction refers to the ability of the researcher to predict what sensory stimuli the subject is receiving based purely on neuron action potentials.
Reconstruction refers to the ability of the researcher to predict what sensory stimuli the subject is receiving based purely on neuron action potentials.
Area V1 is the largest single processing module in the human brain. Its function is to represent visual information in a very general form by decomposing visual stimuli into spatially localized elements. Signals leaving V1 are distributed to other visual areas, such as V2 and V3. Although the function of these higher visual areas is not fully understood, it is believed that they extract relatively more complicated information about a scene. For example, area V2 is thought to represent moderately complex features such as angles and curvature, while high-level areas are thought to represent very complex patterns such as faces. The encoding model used in our experiment was designed to describe the function of early visual areas such as V1 and V2, but was not meant to describe higher visual areas. Parcellation of the human brain based on responses to natural movies. Brain activity evoked by natural movies was measured using functional MRI.For visualization the volumetric fMRI signals were projected onto a flattened and colored by similarity for a representation of the neocortical sheet. In this Figure the large maps are the flattened left and right hemisphere of one observer. Known functional areas are identified by white outlines, and major cortical sulci are identified by long black lines. Similar colors on these maps indicate brain regions that responded similarly to the movies. Natural movies tend to elicit similar responses from early and intermediate visual areas, and motor regions associated with eye movements (bright purple and red).Higher visual areas, areas associated with the default network and some regions of frontal cortex appear to form a separate functional cluster (yellow and orange). [Attribution: Shinji Nishimoto, Alex G. Huth, An Vu and Jack L. Gallant, UC Berkley, 2011.]
[1] Record brain activity while the subject watches several hours of movie trailers. [2] Build dictionaries (regression model) to translate between the shapes, edges and motion in the movies and measured brain activity. A separate dictionary is constructed for each of several thousand points in the brain at which brain activity was measured. [3] Record brain activity to a new set of movie trailers that will be used to test the quality of the dictionaries and reconstructions. [4] Build a random library of ~18,000,000 seconds of video downloaded at random from YouTube (that have no overlap with the movies subjects saw in the magnet). Put each of these clips through the dictionaries to generate predictions of brain activity. Select the 100 clips whose predicted activity is most similar to the observed brain activity. Average those clips together. This is the reconstruction. [Attribution: Shinji Nishimoto & Jack L. Gallant, UC Berkeley, 2011.]
3 subjects – age 30, 34, and 23. All subjects were healthy and had normal or corrected-to-normal vision. It takes several hours to acquire sufficient data to build an accurate motion-energy encoding model for each subject, and naive subjects find it difficult to stay still and alert for this long. Authors are motivated to be good subjects, to ensure their data are of high quality. These high quality data enabled us to build detailed and accurate models for each individual subject. The experiment focuses solely on the early part of the visual system, and this part of the brain is not heavily modulated by intention or prior knowledge. The movies used to develop encoding models for each subject and those used for decoding were completely separate, and there no plausible way that a subject could have changed their own brain activity in order to improve decoding. Many fMRI studies use much larger groups of subjects, but they collect much less data on each subject. Such studies tend to average over a lot of the individual variability in the data, and the results provide a poor description of brain activity in any individual subject.
The Directional Motion-Energy Model Captures Motion Information(A) Top: the static encoding model includes only Gabor filters that are not sensitive to motion. Bottom: prediction accuracy of the static model is shown on a flattened mapof the cortical surface of one subject (S1). Prediction accuracy is relatively poor.(B) The nondirectional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies, but motion in opponent directions is pooled. Prediction accuracy of this model is better than the static model.(C) The directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies and directions. This model provides the most accurate predictions of all models tested.
Motion-energy encoding model identified specific movie stimulus that evoked anobserved BOLD signal 95% of the time (464 of 486 volumes).Within +/- one volume (1 s; subject S1; Figures 3A and 3B).ar above what would be expected by chance (<1%).
What constitutes a visual experience?The visual recognition process is performed by a hierarchy of cortical areas in the ventral stream (Barone et al. 2000; Felleman and Van Essen 1991; Lerner et al. 2001). Processing in the first cortical region, V1, detects movement and color, whereas processing at higher levels detects more complex patterns by combining inputs from lower-levels (Reid 2001; Tanaka 1996). This bottom-up (B-U) information flow is fairly well known. However, there is also a top-down (T-D) flow of information that’s less clear. Anatomical studies have demonstrated the existence of massive connections from higher level areas back to lower level areas (Rockland and Pandya 1981; Salin and Bullier 1995). These T-D connections can strongly affect neuronal function (Cauller and Kulics 1991; Lee et al. 1998; Tomita et al. 1999) and provide a way for high-level information to affect perception (Ress et al. 2000). For example, in “semantic priming,” recognition of words in a category is enhanced if subjects know the category (Lorch et al. 1986; Neely 1991; Neely et al. 1989). Thus contextual expectancies may affect the visual system even before the stimulus arrives.
Psychophysical studies in humans show that a stimulus remains unnoticed during specific states of the subject, such as inattention or absent-mindedness (Rock et al., 1992; Block 1996).This implies that the success of stimulus detection depends on the state of the subject, i.e., on the internal state of the visual cortex.
Neuroscientists generally assume that all mental processes have a concrete neurobiological basis. Under this assumption, as long as we have good measurements of brain activity and good computational models of the brain, it should be possible in principle to decode the visual content of mental processes like dreams, memory, and imagery. The computational encoding models in our study provide a functional account of brain activity evoked by natural movies. It is currently unknown whether processes like dreaming and imagination are realized in the brain in a way that is functionally similar to perception. If they are, then it should be possible to use the techniques developed in this paper to decode brain activity during dreaming or imagination.Jerry Fodor The Mind-Body Problem
Neuroscientists generally assume that all mental processes have a concrete neurobiological basis. Under this assumption, as long as we have good measurements of brain activity and good computational models of the brain, it should be possible in principle to decode the visual content of mental processes like dreams, memory, and imagery. The computational encoding models in our study provide a functional account of brain activity evoked by natural movies. It is currently unknown whether processes like dreaming and imagination are realized in the brain in a way that is functionally similar to perception. If they are, then it should be possible to use the techniques developed in this paper to decode brain activity during dreaming or imagination.Jerry Fodor The Mind-Body ProblemConsider the following thought experiment. You are facing a blank screen that is alternately on and off, and you have been instructed to say "light" when the screen turns on and "dark" when it turns off. A photodiode – a very simple light-sensitive device – has also been placed in front of the screen, and is set up to beep when the screen emits light and to stay silent when the screen does not. The first problem of consciousness boils down to this. When you differentiate between the screen being on or off, you have the conscious experience of "seeing" light or dark. The photodiode can also differentiate between the screen being on or off, but presumably it does not consciously "see" light and dark. What is the key difference between you and the photodiode that makes you "see" light consciously? When the blank screen turns on, the photodiode enters one of its two possible alternative states and beeps. As with the coin, this corresponds to 1 bit of information. However, when you see the blank screen turn on, the state you enter, unlike the photodiode, is one out of an extraordinarily large number of possible states.According to the theory, the key difference between you and the camera has to do with information integration.