In March, Spotify launched its first AI-powered feature with the debut of its AI DJ — a smart audio guide with a convincingly realistic voice. That AI persona was actually based on a real person, as it turns out — Spotify’s head of Cultural Partnerships, Xavier “X” Jernigan, who had the honor of becoming the first voice model for the AI feature.
TechCrunch sat down with Jernigan to learn more about the process for training the AI and Spotify’s future plans for its AI DJ efforts.
The new AI DJ personalizes the music listening experience for listeners, curating a selection of music based on their interests. It also has spoken commentary about each song — much like a real radio host.
In addition to Jernigan’s primary role at Spotify, he’s also the host of various Spotify podcasts, including “The Window,” “Showstopper” as well as the now-defunct podcast “The Get Up.” So, he’s used to having his voice heard by millions of listeners. Still, having his voice memorialized as an AI is a unique experience.
Spotify chose Jernigan to be the first voice model because his “voice and personality resonated with a lot of our listeners already,” Jernigan told TechCrunch. “[The company was] fairly confident that I would resonate in this way as well.”
Spotify’s Morning Show, “The Get Up,” garnered nearly 6 million listeners and was a top 10 podcast on Spotify before it abruptly ended in 2022, demonstrating Jernigan’s pull.
Still, being the voice model for DJ was hard to wrap his head around at first, the podcast host admitted.
“I got pitched on being this voice model for DJ and my mind was blown when it was explained to me,” Jernigan told us. “Imagine if you’re hearing this for the first time you don’t have anything to look at and I’m just like, ‘Wait, what? It’s gonna be me but it’s not me, and it’s text and voice, but it’ll sound like me, and it’s AI?”
“For me, it was a new experience working with AI in this way. I was just blown away,” he added.
Spotify says its AI DJ was built using both Sonantic and OpenAI technologies.
Sonantic is an AI startup that Spotify acquired last year. The company’s tech was responsible for building AI-based realistic voices, including the one used for Val Kilmer’s voice in “Top Gun: Maverick.”
Prior to the acquisition, Spotify spent a few years researching AI-powered technology and worked on the DJ feature “in some iteration,” Jernigan noted. He declined to share exactly how long the process took but said integrating the Sonantic technology “really kicked it into high gear.”
Jernigan explained the process of training the AI, which entailed going into a studio, reading off a script and speaking in various cadences and inflections to convey different emotions. He fed the AI certain words that only he uses to make it feel as authentic as possible.
“We use words that I say… I don’t say ‘tunes’ for songs. That’s just not how I talk,” he said. “I say, ‘hits’ or ‘bangers.’ So, you will hear DJ say those kinds of words,” Jernigan continued. “We even did a whole process of like, how do I say ‘hey,’ how do I say ‘hello.’ I carried around a notebook, and I would just write down these different phrases that were something I would say.”
He added that the Spotify team made sure to keep in his natural pauses and breaths so the AI voice would truly sound human-like.
Even Jernigan’s mom gave her stamp of approval to the results.
“[DJ] passed the mama test. I played it for her before it came out, explaining it to her and I’m trying to get her to wrap her mind around it,” he said. “She listened to all my podcasts, so she’s used to hearing my voice recorded and played before and she was like ‘That sounds exactly like you.’ My mama said it sounded like me, so I knew it was spot on.”
Although realistic AI voices already exist, we’d argue that Spotify’s DJ is the calmest and most chill-sounding compared with others we’ve heard. Though Google’s Duplex technology may sound authentic, it’s not necessarily a voice that’s nice to listen to when you’re trying to vibe out to your summer jam playlist.
“For me, doing the performance from a voice acting standpoint, my aim was to connect with people and to converse with people and to think about one person. So, when I was training the AI, I just pictured one person when I was in the studio, talking to them and being their friend,” he added.
In addition to making the AI voice sound friendly to listeners, the design of the DJ itself was also made to feel approachable.
The animated green circle that users see when listening to the DJ is a nod to the Spotify logo and moves like a mouth when the AI talks.
“When it came to the design, we thought about the entire experience — how it works, how it sounds, how it looks and how to make it personal for each user,” Emily Galloway, head of Product Design for Personalization at Spotify, told TechCrunch. “Early on for the visual side, we explored some options that felt more technical (imagine things like soundwaves). Yet this didn’t feel right since we wanted to humanize the AI…”
“We wanted to make it look and feel unique. In fact, it was so unique that it was awarded a design patent,” Galloway added.
Jernigan contributed to DJ in other ways besides recording his voice.
In order for the AI to provide expert commentary about the music, Spotify put together a writer’s room comprised of curators, culture experts and music experts.
Jernigan has an extensive background in music, so he was also a participant in the writer’s room. He previously worked for top artists like Diddy, Amy Winehouse and 2 Chainz, among others.
And while Jernigan is the first voice model for DJ, there’s the potential for listeners to hear more voices in the future.
TechCrunch asked Jernigan if the company had any plans to hire voice models that speak other languages.
“Stay tuned,” he hinted.
The AI DJ is currently only available in English for Premium subscribers in the U.S. and Canada. As of February, the DJ feature is still in beta testing.
“We got a whole bunch of really cool new features coming out across the board,” Jernigan said. “We got really dope stuff that’s coming out.”