neuromorphic imaging

The digital era of using computers for signal processing is quite mature: one can process, and even generate new, high-definition videos using sleek hand-held and mobile devices; and create incredibly rich-sounding music; and detect and alarm at intruders using tiny sensors; and many more applications that we often take for granted. However, there are fundamental limitations to conventional methods of signal processing.

For instance, can you record a video in the darkest of the dark tunnels and be able to capture the oncoming traffic at the end of the tunnel? The problem here is in assigning dynamic range to the details in the high amplitude intervals when tuned to resolve details in the low amplitude intervals — the classical problem of high-dymanic range imaging. Smartphone cameras are perfectly capable of capturing HDR images using multi-bracket exposure, but is impractical in videos. Next, can you record high-frame rate video, say of the order of a million frames per second, without compromising on the spatial resolution? While the former examples, one of my favourite subjects, has been addressed extensively by Ayush Bhandari using computational sensors called the self-reset ADC, the latter problem is the subject of my PhD.

Advancements in neuromorphic computing have made it possible to develop a novel class of visual sensors called dynamic vision sensors (DVS) that asynchronously record scenes in terms of a series of events denoting change in the incident log-intensity. Neuromorphic encoding schemes opens up new research avenues for computational sensing that break conventional barriers event of signal acquisition, here, enabling temporal resolutions as low as one microsecond.

High-speed video reconstruction from dynamic vision sensors is a large-scale, ill-posed inverse problem. Temporally, the encoding is akin to literature in time-encoding (see our contributions here). Typically, inverse problems take the form

\[\hat{x} = \arg\min_{x}\frac12\|y-Ax\|_2^2 + \lambda f(x),\]

where \(y\) denotes the measurements, \(A\) denotes the nonlinear acquisition model and \(f\) is a regularization function that enforces desired priors on the reconstruction. Several questions are unanswered here: starting from how the measurements are acquired or obtained, how the acquisition process is modelled, what is a suitable regularization function, can this large-scale problem be solved in reasonable compute times with limited compute resources; and most importantly, is this even the best approach for video reconstruction. These are the questions I answer over my PhD. The answers are positive, except for the last one which indeed requires some thinking! Please see my papers for more information.