Compression along the visual pipeline

Tandon, Pulkit

Compression along the visual pipeline

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ftn910kv2791" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Vision is one of our primary senses and means of obtaining information about the world. In fact, videos represent upwards of 80% of the internet traffic as of today. Imagine the journey of a video — from capturing of the visual information, through its compression, to its transmission. Once this compressed video is received and decoded, the human retina transduces the visual information into electrical signals which results in some salient perception. My thesis is dedicated to various aspects of this visual pipeline, with particular focus on compression. In this thesis, I present my research on Brain-Machine-Interfaces (BMIs) and video compression as pertaining to the visual pipeline. The first part of the thesis is dedicated to BMIs, in particular, epiretinal prostheses. First we describe a lossy compression framework, called wired-OR, to enable large scale electrical recordings for BMIs. Simulation results using data obtained from primate retina ex-vivo with a 512-channel electrode array show average compression rates of up to ~40X while missing less than 5% of the cells. Next, we provide a mechanism to avoid inadvertent stimulation-based percepts in epiretinal prostheses, called axon-bundle artifacts. We provide a simple and principled algorithm to determine axon bundle thresholds which is able to detect the onset of axon bundle activation for 88% of the electrodes within +-10% of the manually identified threshold, with a correlation coefficient of 0.95. The second part of the thesis is dedicated to video compression, in particular how we can exploit the properties of human perception for detecting characteristic video artifacts as well as doing extreme compression of videos with prior information. First we describe CAMBI, Contrast-Aware Multiscale Banding Index, to predict visibility of banding artifacts in visual content. CAMBI correlates well with subjective perception of banding, with a correlation coefficient of 0.93, while using only a few visually-motivated hyperparameters. Finally, we conclude this thesis with Txt2Vid, which enables ultra-low bitrate compression of webcam videos by utilizing state-of-the-art generative models as decoders. Txt2Vid achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Tandon, Pulkit
Degree supervisor	Weissman, Tsachy
Thesis advisor	Weissman, Tsachy
Thesis advisor	Murmann, Boris
Thesis advisor	Sra, Misha
Degree committee member	Murmann, Boris
Degree committee member	Sra, Misha
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Pulkit Tandon.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/tn910kv2791

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...