It’s one of the biggest cliches in crime and science fiction: An investigator pulls up a blurry photo on a computer screen and asks for it to be enhanced, and boom, the image comes into focus, revealing some essential clue. It’s a wonderful storytelling convenience, but it’s been a frustrating fiction for decades — blow up an image too much, and it becomes visibly pixelated. There isn’t enough data to do more.
“If you just naïvely upscale an image, it’s going to be blurry. There’s going to be a lot of detail, but it’s going to be wrong,” said Bryan Catanzaro, vice president of applied deep learning research at Nvidia.
Recently, researchers and professionals have begun incorporating artificial intelligence algorithms into their image-enhancing tools, making the process easier and more powerful, but there are still limits to how much data can be retrieved from any image. Luckily, as researchers push enhancement algorithms ever further, they are finding new ways to cope with those limits — even, at times, finding ways to overcome them.
In the past decade, researchers started enhancing images with a new kind of AI model called a generative adversarial network, or GAN, which could produce detailed, impressive-looking pictures. “The images suddenly started looking a lot better,” said Tomer Michaeli, an electrical engineer at the Technion in Israel. But he was surprised that images made by GANs showed high levels of distortion, which measures how close an enhanced image is to the underlying reality of what it shows. GANs produced images that looked pretty and natural, but they were actually making up, or “hallucinating,” details that weren’t accurate, which registered as high levels of distortion.
Michaeli watched the field of photo restoration split into two distinct sub-communities. “One showed nice pictures, many made by GANs. The other showed data, but they didn’t show many images, because they didn’t look nice,” he said.
In 2017, Michaeli and his graduate student Yochai Blau looked into this dichotomy more formally. They plotted the performance of various image-enhancement algorithms on a graph of distortion versus perceptual quality, using a known measure for perceptual quality that correlates well with humans’ subjective judgment. As Michaeli expected, some of the algorithms resulted in very high visual quality, while others were very accurate, with low distortion. But none had both advantages; you had to pick one or the other. The researchers dubbed this the perception-distortion trade-off.
Michaeli also challenged other researchers to come up with algorithms that could produce the best image quality for a given level of distortion, to allow fair comparisons between the pretty-picture algorithms and the nice-stats ones. Since then, hundreds of AI researchers have reported on the distortion and perception qualities of their algorithms, citing the Michaeli and Blau paper that described the trade-off.
Sometimes, the implications of the perception-distortion trade-off aren’t dire. Nvidia, for instance, found that high-definition screens weren’t nicely rendering some lower-definition visual content, so in February it released a tool that uses deep learning to upscale streaming video. In this case, Nvidia’s engineers chose perceptual quality over accuracy, accepting the fact that when the algorithm upscales video, it will make up some visual details that aren’t in the original video. “The model is hallucinating. It’s all a guess,” Catanzaro said. “Most of the time it’s fine for a super-resolution model to guess wrong, as long as it’s consistent.”