Compression along the visual pipeline

Placeholder Show Content

Abstract/Contents

Abstract
Vision is one of our primary senses and means of obtaining information about the world. In fact, videos represent upwards of 80% of the internet traffic as of today. Imagine the journey of a video — from capturing of the visual information, through its compression, to its transmission. Once this compressed video is received and decoded, the human retina transduces the visual information into electrical signals which results in some salient perception. My thesis is dedicated to various aspects of this visual pipeline, with particular focus on compression. In this thesis, I present my research on Brain-Machine-Interfaces (BMIs) and video compression as pertaining to the visual pipeline. The first part of the thesis is dedicated to BMIs, in particular, epiretinal prostheses. First we describe a lossy compression framework, called wired-OR, to enable large scale electrical recordings for BMIs. Simulation results using data obtained from primate retina ex-vivo with a 512-channel electrode array show average compression rates of up to ~40X while missing less than 5% of the cells. Next, we provide a mechanism to avoid inadvertent stimulation-based percepts in epiretinal prostheses, called axon-bundle artifacts. We provide a simple and principled algorithm to determine axon bundle thresholds which is able to detect the onset of axon bundle activation for 88% of the electrodes within +-10% of the manually identified threshold, with a correlation coefficient of 0.95. The second part of the thesis is dedicated to video compression, in particular how we can exploit the properties of human perception for detecting characteristic video artifacts as well as doing extreme compression of videos with prior information. First we describe CAMBI, Contrast-Aware Multiscale Banding Index, to predict visibility of banding artifacts in visual content. CAMBI correlates well with subjective perception of banding, with a correlation coefficient of 0.93, while using only a few visually-motivated hyperparameters. Finally, we conclude this thesis with Txt2Vid, which enables ultra-low bitrate compression of webcam videos by utilizing state-of-the-art generative models as decoders. Txt2Vid achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Tandon, Pulkit
Degree supervisor Weissman, Tsachy
Thesis advisor Weissman, Tsachy
Thesis advisor Murmann, Boris
Thesis advisor Sra, Misha
Degree committee member Murmann, Boris
Degree committee member Sra, Misha
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Pulkit Tandon.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/tn910kv2791

Access conditions

Copyright
© 2022 by Pulkit Tandon
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...