Our approach to studying vision and perceptual judgment making begins by posing and answering two questions: (1) what is the optimal computational strategy (the ideal observer) for performing a particular perceptual task given known human limitations, and (2) what additional limitations must be imposed on the optimal computation in order account for measured human performance? We have developed a computational-experimental framework for (a) measuring the information content of complex stimuli within the context of a task, (b) describing the corresponding optimal computation in the form of an ideal observer, (c) investigating potential constraints and predicting probable deviations from optimality, and (d) designing experiments to reveal the underlying computations and uncover the constraints acting on human observers.
The primary focus of our current research program is overt visual search, along with transsaccadic integration and transaccadic memory, which are critical components of overt search. Visual search is an ideal platform for studying perceptual integration and decision making. Search is ubiquitous, occurring as a component of many natural tasks. It is also behaviorally rich, incorporating many important features of more complex visual tasks. Despite this richness, search can be simple enough to allow a tractable formal analysis.
In addition to visual search, our research efforts also include work on perceptual learning and on population coding in visual cortex.
Natural visual tasks, such as visual search, typically involve at least several saccadic eye movements, with visual information collected during the intervening fixations. Integrating this visual information across eye movements requires memory. However, very little is known about this transsaccadic memory and how it constrains visual search performance. In the projects described below, we have started to answer these questions. In particular, we measure the limited capacity of transsaccadic memory, we show that this limited capacity can substantially degrade search performance, and we explore how search performance is impacted by various alternative memory allocation strategies. Our lab has begun to investigate this problem.
This work has four aims: (1) to derive normative (ideal observer) models of memory-limited visual integration in overt seach, (2) to use these models, along with human psychophysics to estimate transsaccadic memory capacity, (3) to determine how this measured transsaccadic memory capacity limits visual search performance, and (4) to characterize the specific memory allocation strategies used by human observers in transsaccadic integration. Over the past few years, we have made considerable progress on these aims. In particular, we derived a memory-limited ideal searcher and devised a pair of tasks that allowed us to estimate the capacities of both visual short-term memory (VSTM) and trans-saccadic memory (TSM) in task-independent units (bits). Our results suggest that TSM plays an important role in visual search tasks, that the effective capacity of TSM is greater than that of VSTM (a novel result), and that the TSM capacity of human observers significantly limits performance in multiple-fixation visual search tasks (Kleene & Michel, 2018).
We have also been exploring both normative and human dynamic memory allocation strategies. Using the memory-limited ideal searcher, we showed how the optimal memory allocation strategy changes systematically as a function of available capacity, with equitable allocation strategies (in which available capacity is allocated equally among potential targets) becoming less optimal and max-like allocation strategies (in which available capacity is only allocated to the most likely target) becoming more optimal as capacity decreases (Kleene & Michel, 2018). In another study, we are using multiple-target, multiple-interval discrimination tasks, together with temporal reverse correlation, to determine how human observers actually allocate TSM across time. Preliminary results suggest that observers tend to exhibit a recency effect, allocating more capacity to later than to earlier information, and that the magnitude of this recency effect increases with increasing memory load (Bittner & Michel, in preparation).
Efficient performance in visual search and detection requires that observers exclude signals from irrelevant locations. However, phenomena such as crowding and illusory feature conjunctions, as well as evidence from position discrimination studies, suggest that the ability to localize features and thus ignore irrelevant information declines rapidly in the periphery. In the projects described below, we characterize this intrinsic position uncertainty and show that by modeling its effects, we can account for various systematic patterns of behavior in both fixed-gaze and multiple-fixation visual searches.
To measure intrinsic position uncertainty and its role in peripheral detection sensitivity, we devised a paradigm to measure both target detection and localization performance of human observers in the periphery, under varying levels of extrinsic position uncertainty. We found that an ideal observer with intrinsic position uncertainty (the intrinsic uncertainty observer) predicts both target localization errors and detection psychometric functions as a function of eccentricity and level of extrinsic noise. These results provide definitive evidence that position uncertainty is an important factor limiting detection and localization performance in the periphery (Michel & Geisler, 2011).
In a follow up project, we have extended this work to measure the role of intrinsic position uncertainty in overt search (i.e., search involving eye movements). Using a constrained ideal searcher model in which the searcher is limited by the same peripheral intrinsic uncertainty measured for human observers, we can accurately predict search times and accuracy as a function of the density of visual clutter in the search display (Semizer & Michel, 2017).
While most real-world tasks involve multiple visual fixations, visual performance thresholds are typically measured during stable fixation. Electrophysiological investigations of visual neurons in tasks involving saccadic eye movements have repeatedly shown that these neurons dramatically change their tuning during the perisaccadic interval (near the onset of an eye movement). Though the perceptual consequences of these changes in neural tuning are not well understood, they suggest that perceptual computations made in the interval preceding an eye movement may differ significantly from those made during stable fixation.
A critically important factor in visual search tasks is the rapid decline in sensitivity to target signals in the peripheral visual field. When measured human peripheral sensitivity measurements are incorporated into an ideal observer model of visual search, these patterns of peripheral sensitivity account for many aspects of human search performance.
In an effort to determine the flexibility of saccadic strategies in response to dramatic changes (i.e., as might result from disease or injury) in visual sensitivity, we used gaze-contingent displays to train observers to perform search with modified visibility maps (Michel & Geisler, 2009, 2010). Using an ideal observer model of visual search, we found that, while normal observers could not adapt to changes that required a wholesale shift in saccadic strategy, they adapted quickly to transformations of the visual field consistent with deficits encountered in degenerative macular diseases (e.g., AMD, Stargardt’s disease). This result suggests that, for visual search at least, the impact of rehabilitative eye movement training may be limited for patients with bilateral central field loss.
In a project completed in collaboration with Karl Gegenfurtner's lab at the University of Giessen in Germany, we used the ideal searcher model to examine the efficiency of visual search in low-light (scotopic) environments, for which the visual sensitivity map changes naturally, shifting the point of maximum sensitivity away from the center of the visual field. We found that human searchers made patterns of fixations that qualitatively matched those of an ideal searcher that has human scotopic sensitivity across the visual field and, importantly, that these fixation patterns were different from those predicted by an ideal searcher with human photopic sensitivity, which is highest in the center of the visual field. In general, we found that while human searchers are not optimal under scotopic conditions, they do make principled adjustments in their search behavior as ambient light levels decrease (Paulun, Schütz, Michel, Geisler, & Gegenfurtner, 2015).
Contrary to most laboratory searches, real-world searches typically involve a great deal of uncertainty regarding the target object, whose appearance in natural scenes varies greatly due to differences among individual objects (exemplars) within a target category; to differences in pose, lighting, and viewing angles; and due to partial and self occlusions. In addition, the ability to detect and localize objects in real-world scenes is affected by scene clutter, by scene context, and by the manner in which the search target is indicated to the observer.
In ongoing projects involving search for categorical targets (e.g., cell phones, keys, pens, eyeglasses), we are modeling and quantifying visual clutter to determine how intrinsic position uncertainty limits search in real-world scenes. In one of these projects, we select images of real-world scenes to manipulate clutter independently of the relevant search set size. The results of this project extend our previous finding — for searches in synthetic noise displays (Semizer & Michel, 2017) — to show that intrinsic position uncertainty and clutter act similarly in real-world searches, with clutter degrading search performance independent of the search set size (Semizer & Michel, 2018). In a second project, we are attempting to model and measure the effects of different sources of clutter. Within vision research, clutter is typically measured in a task-independent way, without any consideration of task-relevant features. In this project, we are developing additional measures of clutter that take into account which scene features are relevant given the target of a search task. Our goal is to determine how the distribution of these features in the search scene ought to affect search and to measure how alternative measures of clutter (i.e., task-independent clutter, target-category relevant clutter, and target-exemplar relevant clutter) actually influence human performance in real-world search tasks.
In another ongoing project, we are using computer vision techniques, along with ideal-observer models of visual search to compute normative (optimal) saccade trajectories for categorical target searches in real-world scenes (Kleene & Michel, in preparation). The computed trajectories are influenced by the pattern of perceptual sensitivity to the target across the visual field, by the distribution of target-relevant features across the scene, and by contextual information that can be used to constrain a priori the possible locations of targets within a scene. We will compare the normative trajectories with those of human observers to determine how efficiently human observers use information from these various sources when selecting fixations.
In another ongoing project, we characterize the specificity of a verbal label by measuring the distributions implicit in the named target categories (e.g., named hues and shapes), and measuring how differences in specificity lead to performance differences in search performance (Nikiforova & Michel, in preparation). Moreover, using and manipulating the specificity of image cues to the target, we can determine the extent to which gaps in performance across the form of the cue used to indicate the search target (i.e., verbal label vs. image cue) are driven by cue specificity. Preliminary results with color cues indicate that differences in the precision of information conveyed by color labels vs. displayed colors can completely account for any observed differences in search performance.
One of the most iconic and medically important forms of visual search is the search of x-ray and other medical images for abnormalities. The judgments made by expert radiologists in these searches have real-world implications. This makes radiological search a great applied domain within which to study search. Any insights that can improve these judgments, especially for screenings like routine mammography, will be consequential.
Mammalian primary visual cortex (V1) is topographically arranged such that neurons that are spatially adjacent tend to be sensitive to similar features and spatial locations. A fundamental question in systems neuroscience is: what is the functional significance of this topographical organization? I have been examining this question in an ongoing collaboration with Eyal Seidemann’s lab at the University of Texas, which uses electrophysiology and various fluorescence imaging techniques to study visual physiology in behaving monkeys. Our key insight is that if input signals to higher cortical areas involve a pooling of responses from V1 neurons within a local neighborhood, then the local topography should systematically influence the information available for perceptual judgments.
Motivated by this insight, we used computational modeling to predict a novel relationship between the representation of texture and shape in V1, and to predict a new shape illusion based on this relationship, we then used voltage-sensitive dye imaging in awake behaving monkeys and psychophysics in human observers to confirm the predicted shape illusion (Michel, Chen, Geisler, & Seidemann, 2013).
In related projects, we have used similar techniques to examine the population coding of colinear contour elements (Michel, Chen, Seidemann, & Geisler, 2018) and to investigate whether the decoding of cortical signals used in visual perception is more consistent with a sparse "lower-envelope" coding strategy (mediated by a handful of neurons most sensitive to the stimulus) or with a more distributed coding strategy (mediated by the combined responses of all relevant neurons; Chen, Michel, Geisler, & Seidemann, 2012).
Investigators debate the extent to which neural populations use high-order statistical dependencies (correlations) among neural responses to represent information about visual stimuli. Characterizing these high-order dependencies can be difficult, and researchers often decode population responses under the assumption that these correlations contain little information.
Consider a radiologist who learns to detect lesions or tumors in images generated using a particular imaging method and must later detect similar targets in images generated using a different imaging method. The structure of the noise can change across the two types of images such that different sets of features become diagnostic.