Fine-grained importance for perceptual video compression

Placeholder Show Content


The proliferation of videos that are consumed by humans over the internet has accelerated the search for better video compression algorithms. Traditional video compression algorithms reduce video file sizes by removing spatial, temporal, and coding redundancies. Because different spatio-temporal regions of the video differ in their relative importance to the human viewer, there is an opportunity to improve video compression algorithms even further, by removing perceptual redundancy. However, it is challenging to infer what the levels of importance are to the viewer in different areas, or to even collect such fine-grained information. Indeed, such information is often not used during compression beyond low-level heuristics. In this dissertation, we present a framework that facilitates research into fine-grained subjective importance in compressed videos, which we have utilized to improve the rate-distortion performance of existing video codecs. The specific contributions of the work presented in this dissertation are threefold: (1) we designed a novel tool, the Perceptual Importance Map collection Tool (PIMTool), an interactive web-tool which allows scalable collection of fine-grained perceptual importance by having users interactively paint spatio-temporal maps over encoded videos. While users use the tool, the videos presented to the users are constantly updated based on their painted spatio-temporal maps, showing the users the trade-off between improving the importance of certain areas and decreasing the importance of other areas. This tool also allows users to control the magnitude of increase or decrease of the importance in different areas in the video, resulting in detailed relative importance maps; (2) Using PIMTool, we collected a dataset of 178 videos with a total of 14443 frames of human annotated spatio-temporal importance maps over the videos. We call this dataset the Perceptual Importance Map Dataset (PIMD). Via a subjective study, we demonstrate that encoding the videos in our dataset while taking into account the importance maps leads to higher perceptual quality at the same bitrate, with the videos encoded with importance maps preferred 1.8x over the baseline videos; and (3) we used our curated dataset to train a lightweight machine learning model that can predict these spatio-temporal importance regions. We call this model the Perceptual Importance Map Model (PIMM). Our results show that for the 18 videos in our test set, the importance maps predicted by our PIMM model lead to higher perceptual quality videos, 2x preferred over the baseline at the same bitrate.


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English


Author Pergament, Evgenya
Degree supervisor Katti, Sachin
Thesis advisor Katti, Sachin
Thesis advisor Rippel, Oren
Thesis advisor Weissman, Tsachy
Degree committee member Rippel, Oren
Degree committee member Weissman, Tsachy
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Evgenya Pergament.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.

Access conditions

© 2023 by Evgenya Pergament
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...