Fine-grained importance for perceptual video compression

Pergament, Evgenya

Fine-grained importance for perceptual video compression

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fvr891bv7735" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The proliferation of videos that are consumed by humans over the internet has accelerated the search for better video compression algorithms. Traditional video compression algorithms reduce video file sizes by removing spatial, temporal, and coding redundancies. Because different spatio-temporal regions of the video differ in their relative importance to the human viewer, there is an opportunity to improve video compression algorithms even further, by removing perceptual redundancy. However, it is challenging to infer what the levels of importance are to the viewer in different areas, or to even collect such fine-grained information. Indeed, such information is often not used during compression beyond low-level heuristics. In this dissertation, we present a framework that facilitates research into fine-grained subjective importance in compressed videos, which we have utilized to improve the rate-distortion performance of existing video codecs. The specific contributions of the work presented in this dissertation are threefold: (1) we designed a novel tool, the Perceptual Importance Map collection Tool (PIMTool), an interactive web-tool which allows scalable collection of fine-grained perceptual importance by having users interactively paint spatio-temporal maps over encoded videos. While users use the tool, the videos presented to the users are constantly updated based on their painted spatio-temporal maps, showing the users the trade-off between improving the importance of certain areas and decreasing the importance of other areas. This tool also allows users to control the magnitude of increase or decrease of the importance in different areas in the video, resulting in detailed relative importance maps; (2) Using PIMTool, we collected a dataset of 178 videos with a total of 14443 frames of human annotated spatio-temporal importance maps over the videos. We call this dataset the Perceptual Importance Map Dataset (PIMD). Via a subjective study, we demonstrate that encoding the videos in our dataset while taking into account the importance maps leads to higher perceptual quality at the same bitrate, with the videos encoded with importance maps preferred 1.8x over the baseline videos; and (3) we used our curated dataset to train a lightweight machine learning model that can predict these spatio-temporal importance regions. We call this model the Perceptual Importance Map Model (PIMM). Our results show that for the 18 videos in our test set, the importance maps predicted by our PIMM model lead to higher perceptual quality videos, 2x preferred over the baseline at the same bitrate.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Pergament, Evgenya
Degree supervisor	Katti, Sachin
Thesis advisor	Katti, Sachin
Thesis advisor	Rippel, Oren
Thesis advisor	Weissman, Tsachy
Degree committee member	Rippel, Oren
Degree committee member	Weissman, Tsachy
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Evgenya Pergament.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/vr891bv7735

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...