“An unprecedented attack on documentary reality”
In Roadrunner, the recently released documentary about the life and tragic death of Anthony Bourdain, he narrates an email sent to a friend in which he says “…and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” Bourdain’s voice is featured throughout the film, but these three lines of dialogue were particularly unique, because they were in fact never spoken by the late chef and travel documentarian. The documentary’s director Morgan Neville had contracted an AI company and supplied it with dozens of hours of Bourdain speaking, who then used them to create these three lines of dialogue.
In addition to this, it was not until he was featured in a New Yorker article that this information came to light. In the article, he told the interviewer that there were three lines of dialogue that he wanted Bourdain’s voice to orate, but he couldn’t find previous audio to string together or make work otherwise. While it is becoming increasingly known that documentary filmmakers are often offering their viewers their own “creative interpretation of reality”, there is still a widespread perception that the non-fiction nature of documentaries mean they can be seen as accurate representations of “the truth.” Bourdain did say those words in an email, but can it be considered “truth” if he never actually spoke them outloud?
This is just one example illustrating the brave new world we must navigate as the artificial intelligence that is bringing about massive progress is also increasingly blurring the lines between fact and fiction, but it certainly doesn’t stop with voices. If you noticed Tom Cruise acting pretty strange on TikTok earlier this year, biting into a bubblegum lollipop or showing off a coin trick, then you experienced the work of visual and AI effects artist Chris Umé and actor Miles Fisher. Fisher acted as a Cruise stand-in for Umé, who used artificial intelligence technology to merge Cruise’s face with Fisher’s body in the video. But don’t think that this is something only achievable by high-level tech gurus –– the app Reface allows anybody with a smartphone to accomplish the exact same thing (albeit with less finesse and accuracy.) After having been released just over a year ago, the face-swap app recently grabbed $5.5 million in a seed round from the Silicon Valley venture firm Andreessen Horowitz.
While it may be fun (and frankly hilarious) to see your face on Jack Sparrow’s body, running across the beach in a way that only Johnny Depp can fully encapsulate, “deepfakes” have the potential to pose an existential threat to the world. Deepfakes “may very well prove to be the biggest and most disruptive deep learning technology yet, the coming impacts of which nobody is currently equipped to deal with,” said Richard DeVaul, a research scientist who has decades of experience within the evolving tech sector including time as innovation leader and director of engineering at Google’s research and development facility X Development (then known as Google X).
According to DeVaul, he isn’t the only one who is concerned about what the future holds with deepfake technology. The Bulletin of Atomic Scientists, who maintain the internationally recognized Doomsday Clock which serves as a metaphor for threats to humanity from unchecked scientific and technical advances, recently announced that deepfakes were one of the factors that set the hands of the clock to just 100 seconds before midnight, meaning the world is closer to a global catastrophe than ever before.
A portmanteau of “deep learning” and “fake,” deepfakes as defined are a video of a person in which their face or body has been digitally altered so that they appear to be someone else, typically used maliciously or to spread false information. However, as evidenced with Anthony Bourdain this definition is broadening rapidly to include audio and other forms of media. Its origins can be traced back to the time when DeVaul was working at Google X in 2011, when Project Brain was being developed, initiating the machine learning revolution. Today, according to DeVaul, the deep learning technology behind Project Brain in tandem with reinforcement learning now powers practically every online service, from Google searches and Alexa to the face recognition technology that unlocks your phone (without your mask that is.)
Amongst these deep learning systems is GANs, (Generative Adversarial Networks) which has the ability to create extremely convincing “fakes.” It is called adversarial because within the system there is a competition between two deep learning networks: one attempting to make a convincing fake while the other is trying to distinguish the fakes from reality. It is because of this back and forth system that deepfakes are becoming so advanced at such a rapid pace, and also why creating technology to detect deepfakes in the future will be so difficult. “As you make fake detectors better, the fake generators will automatically and necessarily get better in turn,” said DeVaul. “The arms race won’t stop, and past mediocre fakes will be seamlessly replaced by better present ones.
If you’re a student of history, you know that Stalin used a large group of photo retouchers to cut his enemies out of supposedly documentary photographs. Try as he might have to erase history, many untouched photos still remain of him standing next to allies who were later declared enemies, but the practice can now be seen as foreshadowing for the importance of truth in images when it comes to politics and a representative democracy. We know from the most recent elections that disinformation is playing a growing role in our democratic process, and our natural cognitive bias only increases the likelihood that we take what is presented to us as face value if we agree with what is being said. Richard DeVaul said “When it is trivial to fake documentary evidence, the validity of all evidence is called into question. When documentary evidence is no longer reliable, what will be the basis for collective decision-making? Say goodbye to the body of shared facts we call consensus reality.”
So what can be done to prevent us from falling down this slippery and increasingly treacherous slope? As DeVaul sees it, while we can’t turn back from the deepfake course we are on, there are ways that we can divert from one that moves us even closer to midnight on The Bulletin of Atomic Scientists’ Doomsday Clock. While the nature of GAN machine learning means that there is no simple fix utilizing technology, DeVaul believes that strengthening the tools and practices of journalism and documentation –– i.e. discussing the ethics of failing to disclose the use of a deepfake voice in a documentary –– will help counter the worst aspects of a deepfake future. Additionally, he calls for creating new norms that support fact-based institutions and business models.
Today, deepfakes are generally seen as entertaining forms of amusement, but the ramifications of letting them loose unchecked are great. As DeVaul puts it “we ourselves often want to be entertained more than we want the truth. However, I believe that the truth is too important to let the deepfakes win.”