Neural Networks Need Data to Learn. Even If It’s Fake.
Source:https://www.quantamagazine.org/neural-networks-need-data-to-learn-even-if-its-fake-20230616/#comments Neural Networks Need Data to Learn. Even If It’s Fake. 2023-06-18 21:58:04

On a sunny day in late 1987, a Chevy van drove down a curvy wooded path on the campus of Carnegie Mellon University in Pittsburgh. The hulking vehicle, named Navlab, wasn’t notable for its beauty or speed, but for its brain: It was an experimental version of an autonomous vehicle, guided by four powerful computers (for their time) in the cargo area.

At first, the engineers behind Navlab tried to control the vehicle with a navigation algorithm, but like many previous researchers they found it difficult to account for the huge range of driving conditions with a single set of instructions. So they tried again, this time using an approach to artificial intelligence called machine learning: The van would teach itself how to drive. A graduate student named Dean Pomerleau constructed an artificial neural network, made from small logic-processing units meant to work like brain cells, and set out to train it with photographs of roads under different conditions. But taking enough photographs to cover the huge range of potential driving situations was too difficult for the small team, so Pomerleau generated 1,200 synthetic road images on a computer and used those to train the system. The self-taught machine drove as well as anything else the researchers came up with.

Navlab didn’t directly lead to any major breakthroughs in autonomous driving, but the project did show the power of synthetic data to train AI systems. As machine learning leapt forward in subsequent decades, it developed an insatiable appetite for training data. But data is hard to get: It can be expensive, private or in short supply. As a result, researchers are increasingly turning to synthetic data to supplement or even replace natural data for training neural networks. “Machine learning has long been struggling with the data problem,” said Sergey Nikolenko, the head of AI at Synthesis AI, a company that generates synthetic data to help customers make better AI models. “Synthetic data is one of the most promising ways to solve that problem.”

Fortunately, as machine learning has grown more sophisticated, so have the tools for generating useful synthetic data.

One area where synthetic data is proving useful is in addressing concerns about facial recognition. Many facial recognition systems are trained with huge libraries of images of real faces, which raises issues about the privacy of the people in the images. Bias is also a problem, since various populations are over- and underrepresented in those libraries. Researchers at Microsoft’s Mixed Reality & AI Lab have tackled these concerns, releasing a collection of 100,000 synthetic faces for training AI systems. These faces are generated from a set of 500 people who gave permission for their faces to be scanned.

Microsoft’s system takes elements of faces from the initial set to make new and unique combinations, then adds visual flair with details like makeup and hair. The researchers say their data set spans a wide range of ethnicities, ages and styles. “There’s always a long tail of human diversity. We think and hope we’re capturing a lot of it,” said Tadas Baltrušaitis, a Microsoft researcher working on the project.

Another advantage of the synthetic faces is that the computer can label every part of every face, which helps the neural net learn faster. Real photos must instead be labeled by hand, which takes much longer and is never as consistent or accurate.

The results aren’t photorealistic — the faces look a little like characters from a Pixar movie — but Microsoft has used them to train face recognition networks whose accuracy approaches that of networks trained on millions of real faces.

Uncategorized Source:https://www.quantamagazine.org/neural-networks-need-data-to-learn-even-if-its-fake-20230616/#comments

Leave a Reply

Your email address will not be published. Required fields are marked *