Fine-grained importance for perceptual video compression
- The proliferation of videos that are consumed by humans over the internet has accelerated the search for better video compression algorithms. Traditional video compression algorithms reduce video file sizes by removing spatial, temporal, and coding redundancies. Because different spatio-temporal regions of the video differ in their relative importance to the human viewer, there is an opportunity to improve video compression algorithms even further, by removing perceptual redundancy. However, it is challenging to infer what the levels of importance are to the viewer in different areas, or to even collect such fine-grained information. Indeed, such information is often not used during compression beyond low-level heuristics. In this dissertation, we present a framework that facilitates research into fine-grained subjective importance in compressed videos, which we have utilized to improve the rate-distortion performance of existing video codecs. The specific contributions of the work presented in this dissertation are threefold: (1) we designed a novel tool, the Perceptual Importance Map collection Tool (PIMTool), an interactive web-tool which allows scalable collection of fine-grained perceptual importance by having users interactively paint spatio-temporal maps over encoded videos. While users use the tool, the videos presented to the users are constantly updated based on their painted spatio-temporal maps, showing the users the trade-off between improving the importance of certain areas and decreasing the importance of other areas. This tool also allows users to control the magnitude of increase or decrease of the importance in different areas in the video, resulting in detailed relative importance maps; (2) Using PIMTool, we collected a dataset of 178 videos with a total of 14443 frames of human annotated spatio-temporal importance maps over the videos. We call this dataset the Perceptual Importance Map Dataset (PIMD). Via a subjective study, we demonstrate that encoding the videos in our dataset while taking into account the importance maps leads to higher perceptual quality at the same bitrate, with the videos encoded with importance maps preferred 1.8x over the baseline videos; and (3) we used our curated dataset to train a lightweight machine learning model that can predict these spatio-temporal importance regions. We call this model the Perceptual Importance Map Model (PIMM). Our results show that for the 18 videos in our test set, the importance maps predicted by our PIMM model lead to higher perceptual quality videos, 2x preferred over the baseline at the same bitrate.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Department of Electrical Engineering
|Statement of responsibility
|Submitted to the Department of Electrical Engineering.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Evgenya Pergament
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...