Secret Messages Can Hide in AI-Generated Media
Source:https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/#comments Secret Messages Can Hide in AI-Generated Media 2023-05-22 21:58:10

The result comes from the world of information theory, which provides a mathematical framework for understanding communication of all sorts. It’s an abstract and tidy field, in contrast to the complicated messiness of practical steganography. The worlds don’t often overlap, said Jessica Fridrich, a researcher at Binghamton University who studies ways to hide (and detect) data in digital media. But the new algorithms bring them together by satisfying long-standing theoretical criteria for security and suggesting practical applications for hiding messages in machine-generated content. The new algorithms could be harnessed by spies like the New York Russians, but they could also help people trying to get information in or out of countries that prohibit encrypted channels.

Shaved Heads and Other Strategies

The schemes of steganography, Greek for “covered writing,” predate digital media by millennia.

The earliest known examples show up in The Histories by Herodotus, written in the 5th century BCE. In one story, a message is written on wooden tablets and hidden by a layer of wax to avoid interception during its journey. In another, attributed to Aeneas the Tactician, a message hides dots of invisible ink over certain letters, which spell out the true message. In a more extreme example, the tyrannical leader Histiaeus wants to communicate a strategy to his nephew without detection, so he shaves the head of a slave, tattoos his message on the man’s head and waits for the hair to grow back before sending the messenger. Upon arrival, the nephew shaves the messenger’s head, revealing the plans.

These strategies have persisted, and technology has allowed for new ones. German spies during World War I found ways to transmit information via microdot: They copied and reduced a document until it was as small as the dot of an “i,” which appeared innocent but could be revealed through magnification.

Politicians, too, have turned to the deceptive art. In the 1980s, after a series of press leaks, the British prime minister Margaret Thatcher allegedly had the word processors of her ministers reprogrammed so that each had its own, nigh-undetectable but unique pattern of word spacing. That slight modification allowed leaked documents to be traced to the source.

The approach continues to flourish in the 21st century, for good and evil. Modern steganographic strategies include writing messages in invisible ink (another tactic used by the Russian spies in New York), concealing artist signatures in painting details, and designing audio files with a hidden or backward track. Fridrich says steganographic approaches in digital media can also help hide images in voicemail files or, as in the case of the Russian spies, place written text in doctored photographs.

Formalizing Secrecy

It wasn’t until the 1980s that mathematicians and computer scientists began to seek formal, mathematical rules for steganography, Cachin said. They turned to information theory, a field that had begun with Claude Shannon’s seminal 1948 paper “A Mathematical Theory of Communication,” which established an analytical approach to thinking about sending and receiving information through a channel. (Shannon modeled telegraph lines, but he laid the groundwork for today’s digital technologies.) He used the term “entropy” to quantify the amount of information in a variable — the number of bits required to encode a letter or message, for example — and in 1949 he hammered out rules for perfectly secure cryptography. But Shannon didn’t address security in steganography.

Almost 50 years later, Cachin did. His approach, in the spirit of Shannon, was to think about language probabilistically. Consider two agents, Alice and Bob, who want to communicate a message via steganography and keep it secret from Eve, their adversary. When Alice sends an innocuous message to Bob, she selects words from the entire English lexicon. Those words have probabilities associated with them; for example, the word “the” is more likely to be chosen than, say, “lexicon.” Altogether, the words can be represented as a probability distribution. If Alice uses steganography to send an encoded message to Bob, that message will have its own probability distribution.

Information theorists use a measure called relative entropy to compare probability distributions. It’s like measuring an abstract kind of distance: If the relative entropy between two distributions is zero, “you cannot rely on statistical analysis” to uncover the secret, said Christian Schroeder de Witt, a computer scientist at the University of Oxford who worked on the new paper. In other words, if future spies develop a perfectly secure algorithm to smuggle secrets, no statistics-based surveillance will be able to detect it. Their transmissions will be perfectly hidden.

But Cachin’s proof depended on a critical assumption about the message hiding the secret, known as the cover text. In order to come up with a new message indistinguishable from the original, innocuous one, you have to create a perfect simulation of the cover text distribution, Cachin said. In a written message, for example, that means using some tool that can perfectly simulate a person’s language. But human-generated text is just too messy. It’s possible to come close — ChatGPT and other large language models can produce convincing simulations — but they’re not exact. “For human-generated text, this is not feasible,” Cachin said. For that reason, perfectly secure steganography has long seemed out of reach.

Fridrich, whose research focuses on the complicated real-world intricacies of hiding messages in human-made digital media like photographs and text messages, said perfect simulation is a condition that will never be met. “The problem with digital media is that you will never have that real model,” she said. “It’s too complex. Steganography can never be perfect.”

Achieving Perfection

But machine-generated text, of course, is not created by humans. The recent rise of generative models that focus on language, or others that produce images or sounds, suggests that perfectly secure steganography might be possible in the real world. Those models, after all, use well-defined sampling mechanisms as part of generating text that, in many cases, seems convincingly human.

Sokota and Schroeder de Witt had previously been working not on steganography, but on machine learning. They’d been pursuing new ways to transmit information through various channels, and at one point they learned of a relatively new concept in information theory called a minimum entropy coupling.

Uncategorized Source:https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/#comments

Leave a Reply

Your email address will not be published. Required fields are marked *