Compression along the visual pipeline
- Vision is one of our primary senses and means of obtaining information about the world. In fact, videos represent upwards of 80% of the internet traffic as of today. Imagine the journey of a video — from capturing of the visual information, through its compression, to its transmission. Once this compressed video is received and decoded, the human retina transduces the visual information into electrical signals which results in some salient perception. My thesis is dedicated to various aspects of this visual pipeline, with particular focus on compression. In this thesis, I present my research on Brain-Machine-Interfaces (BMIs) and video compression as pertaining to the visual pipeline. The first part of the thesis is dedicated to BMIs, in particular, epiretinal prostheses. First we describe a lossy compression framework, called wired-OR, to enable large scale electrical recordings for BMIs. Simulation results using data obtained from primate retina ex-vivo with a 512-channel electrode array show average compression rates of up to ~40X while missing less than 5% of the cells. Next, we provide a mechanism to avoid inadvertent stimulation-based percepts in epiretinal prostheses, called axon-bundle artifacts. We provide a simple and principled algorithm to determine axon bundle thresholds which is able to detect the onset of axon bundle activation for 88% of the electrodes within +-10% of the manually identified threshold, with a correlation coefficient of 0.95. The second part of the thesis is dedicated to video compression, in particular how we can exploit the properties of human perception for detecting characteristic video artifacts as well as doing extreme compression of videos with prior information. First we describe CAMBI, Contrast-Aware Multiscale Banding Index, to predict visibility of banding artifacts in visual content. CAMBI correlates well with subjective perception of banding, with a correlation coefficient of 0.93, while using only a few visually-motivated hyperparameters. Finally, we conclude this thesis with Txt2Vid, which enables ultra-low bitrate compression of webcam videos by utilizing state-of-the-art generative models as decoders. Txt2Vid achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Stanford University, Department of Electrical Engineering
|Statement of responsibility
|Submitted to the Department of Electrical Engineering.
|Thesis Ph.D. Stanford University 2022.
- © 2022 by Pulkit Tandon
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...