The Data Daily

AI Synthetic Media: What to expect and what it will mean

Last updated: 04-20-2021

Read original article here

AI Synthetic Media: What to expect and what it will mean

AI Synthetic Media: What to expect and what it will mean
Apr 10 · 22 min read
Originally published at https://www.theaigroupie.com .
Looking back on the US 2020 election deepfakes seem to have played a negligible role. The impact of any AI generated misinformation is being dwarfed by our own innate competency for misinformational mischief.
In the near-term we are much more likely to be deceived by deeptweaks than deepfakes — the automated generation of low quality text, the dialling up or down of image attributes (a person’s age, emotion, gender, etc.), or the insertion of short passages of dialogue or new gestures into pre-existing video content. The good news is that these will pose problems enough for us to get our defensive house in order before the real thing arrives.
On a more positive note, new content inspiration, design and production tools will in time become mainstream: tools that will augment rather than displace human creativity. More of us will grow acclimatised to the notion of AI as underlying a powerful and useful human-wielded toolset. In some domains, large numbers of human-machine “centaurs” will be able to be almost as creative as the very best of our human-only peers. This will challenge our sense of who we are.
AI learns from seen data to make predictions about unseen data. What is utterly remarkable is that prediction can underpin extraordinary creativity and mimicry. AI can already generate hyper-realistic fictitious human faces (go to www.thispersondoesnotexist.com and keep hitting refresh), compelling written narratives (see “A robot wrote this entire article”) and convincing clones of human voices (see “This AI clones your voice after listening for 5 seconds” ).
These developments have the potential to unleash an explosion of scale creativity — delivering content design and production tools into the hands of the mass market that have hitherto only been available to large corporations with hefty budgets. Even now — when we are still in the infancy of AI media generation — there are demos, apps and subscription-based services to faceswap individuals into movies ( see Zao ), turn rough sketches into photorealistic images ( try the GauGAN demo here ), convert one voice into another ( see Respeecher ), personalise marketing videos ( try the Synthesia demo here ), age- and emotion-alter images ( see Photoshop’s new Neural Filters ), generate face-synched videos of new or translated scripts ( see Canny AI ), play a video game with characters speaking any of 10 face-synched languages ( see Cyberpunk 2077 ), and play a text-based adventure game with endless dialogue generated by AI ( try out the free version of AI Dungeon here ). Moreover, the same AI techniques will spawn new applications in a wide range of fields: advertising, architecture, interior design, gaming, song-writing, web design, education, even software development and pure mathematics — in fact anywhere where structured or constrained creativity is key.
Unfortunately, as is ever the case, with the potential for great benefit comes great risk. A 2020 study by UCL (reference 1) — involving the input of academics, the police, defence, the government and the private sector — ranked Audio/Video Impersonation, Tailored Phishing and AI-authored Fake News amongst its highest threat AI-enabled future crimes. Amongst other scenarios, participants imagined the impersonation of children to elderly parents over video calls to gain access to funds, machine learning being used in phishing attacks to discover the most potent message contents that maximise deceit effectiveness, and AI being used to generate many versions of fraudulent text content so as to seem to originate from multiple sources to boost visibility and credibility.
Already in 2017 an analysis of views on net neutrality submitted to the US Federal Communications Commission found that fewer than 800,000 of the over 22 million submitted comments were truly unique, that multiple millions of pro-repeal comments were likely to have been machine-generated, and that it was highly likely that, in contrast, more than 99% of the unique comments were in favour of keeping net neutrality. Only fairly simple detection techniques were needed, matching the crudeness of the language manufacturing techniques used to generate the fraudulent comments.
Progress, however, in natural language processing in only the three years that have passed since then has been staggering. One of the most advanced AI language models built to date — known as GPT-3 — was unveiled by OpenAI in July 2020 (ref. 2). In October 2020 a Reddit user answering questions in a user forum with 30 million members was identified as a bot powered by GPT-3 . The content was compelling — offering unique (not copied) new perspectives on life (“The purpose of exercise is to avoid thinking about the fact that you spend your life working for money”), contained compelling self-consistent narratives (with occasional sharp twists at the end — see its answer to “What’s the worst date you’ve experienced?”) and had stimulated emotional exchanges with other posters (including on the subject of suicide). Detection was more difficult and relied on the absurd frequency and length of posts. No human could write this much quality content this quickly.
Earlier in 2020, by way of demonstration, a team from MIT, Canny AI and Respeecher developed a deepfake video of Nixon reading a contingency speech that had been prepared in 1969 in the event of the lunar landing having gone awry (see https://moondisaster.org/ ). It took six months to produce and, in contrast to the multitude of “cheapfakes” that many of us have been exposed to, is extremely convincing. Scientific American invited a group of experts on AI, digital privacy, law and human rights to offer their reactions. Their responses capture well the potentialities of the technology — from the good (“I found myself thinking about the value of deepfakes in immersing people in an alternative reality… a curious anticipation of the fun and education that could be had with it.”) to the less good (“You could have a deepfake that if timed just right, in a very sensitive fashion, could lead to a riot. We’re in a tinderbox right now in the US… polarised politics and society and we’ve got to be on the lookout”).
So how does AI creativity work, what developments should we expect in the coming years and with what possible impacts? To get some sense of the answers to these questions let’s take a look at a couple of representative AI approaches that are behind some of the examples above.
Machine vs. machine: Generative Adversarial Networks (GANs)
In 2014, Ian Goodfellow invented an entirely new conceptual approach to image generation using neural networks: letting two neural networks compete with one another (ref. 3). The first network — the generator network — optimises its tuning parameters so as to best generate images that will fool a second network — the discriminator network — whose job it is to optimise its tuning parameters to identify whether the images it is evaluating are real or generated. Both networks iteratively improve their respective generative and discriminative abilities until performance on both criteria is optimal. The first network acts as a forger whose forgery skills continuously improve, whilst the second acts as a forgery detector whose detective skills are also continuously improving and, crucially, who is providing feedback to the forger on the grounds for its decisions. It’s a sort of cold war, but with a highly successful spy network leaking information between opposing sides, fuelling the pace of escalation.
In 2014 Goodfellow demonstrated that such an approach produces passable generated images of 28 by 28 pixel greyscale handwritten digits. Fast forward to 2020 and GANs can now generate novel hyper-realistic, high-resolution faces given broad style requirements (age, gender, skin colour, hair style, etc. — see ref. 4) and compelling novel high-resolution images of pretty much anything else for which training images are available (see the birds, burgers and bungalows of 2019’s BigGAN — ref. 5). Amongst many other things, GAN-based algorithms can now convert hand sketches into equivalent photos (ref. 6), enable image editing to generate new objects and features in images on command (ref. 7), and literally allow any one of us to dance like a professional (refs. 8, 9). Output quality in many cases still leaves something to be desired, but in a few R&D iterations this will have improved.
StyleGAN can produce realistic composite faces from two distinct input faces (ref. 4)
GANs do have shortcomings. They can be problematic (see the “dog-ball” below) and unstable to train. They don’t yet work particularly well for language generation. However, whether GANs remain one of the techniques of choice for content generation and style transfer or not is irrelevant. Perhaps the key takeaway from tracing their development over the last six years is to acknowledge the sheer rate of progress in the scope and quality of their application. Their shortcomings are already stimulating further inventiveness.
AI can at times be reassuringly fallible: BigGAN’s dog-ball (ref. 5)
Novel replication: Language models
Machine learning works with numbers. Words on the other hand, as you may have noticed, are not numbers. More troublesome, words don’t work in isolation — the meaning of a word in a sentence depends on the other words in the sentence (consider the different meanings of the word “face”). Small perturbations to a sentence (say, swapping a word for another of ostensibly similar meaning e.g. replacing “alter” with “manipulate”) can have huge impacts on meaning. Worse than that, long-range dependencies across lengthy text can count. If that weren’t enough, language encapsulates our knowledge about the world — knowledge that is arguably extraneous to language itself. Worse still, we sometimes use language to declare and achieve our intentions. All of these facets combine to make language generation a supremely tricky task for AI ( see Felix Hill’s brilliant lecture for an exposition of the challenging qualities of language that need to be accommodated by AI).
In 2011, a state of the art AI language generation model (ref. 10) given “The meaning of life is” as a prompt produced: “The meaning of life is the tradition of the ancient human reproduction: it is less favorable to the good boy for when to remove her bigger. In the show’s agreement unanimously resurfaced. The wild pastured with consistent street forests were incorporated by the 15th century…” and so on. Compelling stuff. Compare this gobbledy-gook with the contents of the Guardian article or the Reddit posts cited above. This is going to read like the GAN story — only more so.
In the interim, a succession of breakthroughs (and a pantheon of language models named after Sesame Street and Muppets characters — I kid you not) have got us to the point where OpenAI’s 2020 GPT-3 language model can perform well over a wide range of useful language tasks, and extraordinarily well over a subset. It is not always the best performing for any given task, but it is the most general-purpose language model to date, requiring the least bespoking to any given application.
Amongst other things GPT-3 and its peers can reliably summarise large tracts of text, answer arbitrary questions on the contents of long texts, write code to a certain extent and prove selected maths theorems. The implementation of one such state of the art language model is responsible for the recent improvements in Google Search .
How do they work? A key step forward came in 2013 when Tomas Mikolov developed an efficient way of capturing single word semantics in number form (ref. 11): when these words-as-numbers are mapped one finds that those with shorter distances between them are closer in meaning (see figure below). Since then (and glossing over huge amounts of R&D), successive researchers have slowly worked their way up the hierarchy of language characteristics until today they are able to capture the semantics of long sequences of text in number form. Deep learning-based language models use number representations of seen text to predict the probabilities of what the next words could be following that seen text — and selecting that next word based on these probabilities. The language model does this word by word until it chooses to stop generating text.
Words can be represented as numbers which capture some of their semantic meaning. The smaller the distance between words the closer their meaning.
Generating high quality new text based on seen text requires a model with vast numbers of tuning parameters (so that it can capture the many and subtle patterns that are inherent in language) and lots and lots of training data. In the case of GPT3, that’s 175 billion tuning parameters and c.500 billion tokens (words or numbers or punctuation marks) of training.
And yet, notwithstanding its poster-child status, GPT-3 remains a long way from human level language generation. You can see this if you read through the GPT-3 generated Reddit posts referred to above. Kevin Lacker’s blog post ( “Giving GPT-3 a Turing test” ) outlines some of its limitations in a simple question answering context. Whilst outstanding at answering trivia and common sense questions about facts which are obscure yet nonetheless have appeared in its training data, GPT-3 struggles when asked questions which require the application of world knowledge and/or reasoning to overcome the absence of the corresponding information in its training data (e.g., “Which is heavier, a toaster or a pencil?” or “If I have two shoes in a box, put a pencil in the box, and remove one shoe, what is left?”).
OpenAI seems convinced that continuing to scale up the model will overcome its shortcomings. Many others in the field beg to disagree (for example, listen to Francois Chollet on Lex Fridman’s podcast ). For them, GPT-3 is not so much generalising (i.e., generating radically-different-never-before-written-and-yet-still-compelling text) as acting merely as a vast text database (i.e., acting more as an information retrieval system, generating new text that is not all that different from that which it has seen in the training data). They believe that new paradigms of machine learning will need to be developed in order to overcome the really big language challenges of reproducing world knowledge, reasoning and intent. They point out that humans learn from very few examples (whereas GPT-3’s training data corresponds to around 500,000 years of me reading at my present rate of about a book a month) and that humans learn primarily by doing and interacting with the world and each other rather than by assimilating written content (save for us few introverts).
For me, what’s really exciting about GPT-3 is that there just isn’t much more training data out there. We will very shortly encounter the limits of the “brute force” approaches characterised by the GPT family of models. My hunch (for what it’s worth) is that more parameters will bring more coherent narrative structure over longer lengths of text, but bigger models will not solve the problem of how to capture semantically meaningful world knowledge, will only solve reasoning tasks in classes which are well represented in the training data and will not solve the problem of how to linguistically shape and achieve intent. There just may not be enough information content in everything that has ever been written to capture each of these. The jury is very much out. We may see vast improvements in future GPT-N iterations, but it will be our understanding of the limits of GPT-N that will add stimulus to the kind of paradigm-shifting research that is probably required.
And yet to relegate GPT-3 to the status of “idiot savant” would be to massively underestimate its potentialities. As an illustration, Gwern Branwen’s blog post gives you a strong sense of its possibilities — when skilfully wielded — in the realm of creative fiction. Gwern has GPT-3 write various forms of poetry, folktales, puns, Navy Seal copypasta, job application letters, Dad jokes, and so on. GPT-3’s literary parodies are outstanding:
“‘It’s always the same,’ said Dumbledore sadly. ‘Voldemort has bribed my housekeeper to steal all of my tea and crumpets, and then my students disappoint me. It’s very wearing, Harry.’”
Nonetheless there are an abundance of failure cases in the form of repetitions, GPT-3 not correctly interpreting the instructions or prompts it has been given, generating text that contains logical inconsistencies, etc. Left to its own devices — and notwithstanding the engineering feat that it represents — GPT-3 seems insufficiently reliable to be left on auto-pilot for complex language generation tasks. However, even with its shortcomings, there are certain forms of content where GPT-3 does compete. Poetry (where an admixture of gobbledy-gook can be helpful) is one (see “Artificial Intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry” (ref. 12) — which, it should be noted, used the predecessor model to GPT-3). Social media is another (where an admixture of gobbledy-gook can be a prerequisite). What you do come to realise is that here is a class of tools that, if wielded by expert-users providing tailored prompts and carefully winnowing the outputs, could be enormously powerful. I for one would love to see a new class of “centaur” authors working with such machines as sources of inspiration (and don’t underestimate the artistry — as demonstrated by Gwern — of coaxing creativity out of a language model such as GPT-3).
So where might we be headed?
The above overview of a couple of representative techniques suggests that in the near-term (perhaps one to three years?) we should expect that high-quality manipulation of pre-existing image and video content could become mainstream. The dialling up or down of age (a la Photoshop) could probably already be extended to the dial-driven control of emotion, gender, race, etc. High quality tweaks to existing video content should become available to the mass market: translation, changes to sections of dialogue, changes to discrete gestures, etc. Low quality (adequate to the demands of Twitter and perhaps Reddit) new text generation should be almost automatable. Anything of a higher quality threshold (e.g., error-free long-form content written with a specific storyline in mind) probably won’t be. Augmentation of humans by such machine outputs for these higher quality demands could become the norm. We already have examples of humans working with AI to generate utterly stunning, even bewildering (to my untrained eye) artwork (do not read past this paragraph without taking a look at Nvidia’s AI art gallery ). The anticipation of authors working with language models as sources of inspiration and idea generation is intriguing. This will be a time of deep-tweaks and human augmentation rather than end-to-end deep-fakes. The quality threshold for the latter will still be beyond AI.
Medium-term (three years plus?) perhaps we can expect to see the sorts of end-to-end video generation capabilities illustrated by the Nixon Moon-disaster video — today a product of a six month effort by leading commercial and academic research teams — become mainstream. Production will be through the projection of new identities and abilities onto existing video content rather than AI generation of entirely new video content from scratch. In the world of text perhaps we will gain some level of control over generated text — e.g., dialling up or down attributes of AI generated language (e.g., its sentiment or emotional valence, political leaning, tone, etc.). Don’t doubt that even this latter is non-trivial.
To go beyond this requires the R&D world to overcome what are at present some huge insurmountables, examples of which include (but are by no means limited to):
Learning world knowledge and incorporating this understanding into text (e.g., knowing that fires generally don’t take hold underwater, that tables are solid and that pigs don’t fly)
Controlling and generating top-down hierarchically driven narrative structure to facilitate the control and generation of objective- or plot-driven content (GPT-3 for example is by design very bottom-up — generating one word at a time)
Automatically generating human dialogue which respects our social rules of linguistic interplay and which is reflective of the intents and personalities of the speakers
Generating acceptable quality video content of any reasonable duration (i.e., more than a few seconds) and complexity
Integrating the worlds of text and image/video — which are largely still separate realms of research and technique today
It is extremely difficult to predict when the above might be achieved since to do so is probably going to require new paradigm breakthroughs which go further than developing and extending the AI techniques of today.
The good news is that the short- and medium-term potentialities of AI media generation should give us plenty of time to adjust and learn in the meantime.
With what consequences?
The fact that AI can be “creative” — or can certainly do a bloody good job of mimicking creativity (if there’s a difference) — poses a wide range of questions for us. Here’s a smorgarsbord of some that intrigue me.
In the short-term, will deeptweaks worsen the scale and believability of misinformation?
A few months on from the US 2020 election and deepfakes seem to have played a negligible role. More primitive techniques — photoshopping images or selectively editing videos — remain much more cost effective and adequately impactful. Nancy Pelosi, the Speaker of the US House of Representatives, probably appreciates this more than most. Over the last couple of years she has been the target of a range of manipulated social media image, text and video content — including posts by President Trump himself — purporting to show that Pelosi suffers from alcohol abuse. The techniques have been simple to the point of absurd — the slowing down of videos, introduction of duplicate frames, simple compilations of short footage, tweeting of excerpts from a satirical article, production of a newspaper page facsimile, etc. — but impactful. Collectively they have been viewed by millions. It may be no exaggeration to say that their aggregate impact over time is such that some portion of the American populace either consciously believes that Pelosi has a drinking problem or unconsciously associates her with mild disrepute.
Adding deeptweaks — inserting physical gestures, changing words or single sentences, altering expressions — into the mix will likely enhance such deleterious possibilities. Instead of merely slowing down a Pelosi speech, what if genuinely slurred words were introduced, or new phrases inserted more characteristic of someone under the influence of alcohol? What if Pelosi lurched or missed the glass of water she reaches for? Automatically generated tweets could amplify social media reactions and conceivably solidify their validity in the public mind. Deeptweaks would be more difficult to quickly identify as fraudulent and the misperceptions created and damage done during their initial consumption could endure for longer. Convincing people with even a slight disposition to believe in their mis-message of their falsity would be equally more difficult. Equally, convincing the public of the veracity of genuine content becomes no less difficult. Such a “liar’s dividend” is already starting to become problematic. Just recently, commenters on social media site Parler asserted that President Trump’s post-insurrection speech was a deepfake, forcing the White House to debunk the claim .
Convincing the public that Pelosi has a drinking problem feels like the brainchild of someone with fairly limited misinformation ambitions. Convincing the public that the US election was stolen or that the UK government is indifferent to the Covid plight of its populace are arguably issues where the promulgation of deeptweaked content would have much more significant consequences.
Is defence realistic?
The oft-quoted response to this question comes from Hany Farid , a professor at the University of California, Berkely, who specialises in digital image analysis and forensics: “We are outgunned. The number of people working on the video-synthesis side, as opposed to the detector side, is 100 to 1.” If the only response is using AI to fight AI — i.e., developing and deploying machine learning algorithms that can detect whether content has been machine manufactured or doctored — then this view may hold. In terms of pure tech-driven defence I suspect that as those deepest-of-pockets information distribution platforms (Google, Facebook, etc.) continue to accept more responsibility for fact-checking and deploy meaningful resource to detection, the cost of generating fraudulent content that escapes detection on those platforms will grow to the point that only the best funded maleficents will be able to play.
More optimistically, I think we can expect a range of responses which collectively will serve to minimise the impact:
The return of trusted sources and gatekeepers of news and information
The rapid growth of digital intermediaries’ share of global advertising revenues has placed great pressure on the business models of many newspapers. As Ben Thompson of Stratechery put it , newspapers once solved the old hardest problem of content consumption: Distribution. In the digital age the marginal cost of distribution (putting up a new web page) is now zero. Instead, Google solves the digital age’s hardest problem — Discovery — and thrives in doing so. It would be deliciously ironic if the growth in digital fake content on digital platforms resulted in a new hardest problem — Veracity — and saw the return of third parties valued primarily for the authenticity of their news content. Perhaps Facebook’s setting up of partnerships with news sources around the world is motivated not only to seek to appease regulatory concern over it’s share of advertising revenues but also to mitigate the risk to itself of fake news promulgation.
Rules and regulations, codes of practice
A range of approaches to dealing with misinformation are being contemplated by governments around the world. I’m no lawyer, but I can appreciate that it must be extraordinarily difficult to legislate in this domain. The lawyer interviewed by Scientific American advises her clients that “Not all deepfakes should be banned, but only those that cause individual or social harm and that do not constitute parody or satire”. One can imagine the difficulty of clearly defining and policing the boundary between the acceptable and the unacceptable. Corporate codes of practice, introduced under the pressure of consumer expectations, are conceivably going to be an important part of the answer. Something akin to colour-coded risk labelling of content may be more likely than outright deletion. We will soon find out.
I remember being mightily disappointed years ago when I first heard of the concept of “identify theft” and discovered that it meant only the stealing of a person’s online credentials (not to trivialise the crime nor its impact). AI mimicry means that something closer to real identity theft may be a problem of the future. Avatars that can self-generate movement and speech that faithfully mimic our individual idiosyncratic physical and oral mannerisms and patterns of speech are absolutely foreseeable. This may be a domain where legislative solutions are more realistic, building on existing defamation legislation. It may also be a domain where sufficient commercial incentives exist to drive progress (imagine media companies, actors, artists seeking to protect “identity copyrights” — or indeed to licence them for commercial use).
New social and cultural norms
The emphasis thus far has been on the exploitation of deepfake technology by bad actors (criminals, rogue states, etc.). We shouldn’t forget that mainstream media is also very much in the business of stretching the truth in the name of creating compelling content (if Netflix’s The Crown is anything to go by ). We should expect that machine-altered and machine-generated content will merely (albeit significantly) broaden the scope of what used to go by the name of “special effects”. Machined content — both for the good and for the bad — will become part and parcel of our media experience.
How will we cope? Perhaps explicit education ( see the Finnish example ) and/or implicit experience will see us eventually regarding those duped by fake content as objects of social ridicule — heightening our personal incentive not to be duped. Perhaps we will develop new common sense instincts to assess trustworthiness in the online environment. Perhaps the physical world will become a place of more truthful, genuine experience — somewhere increasingly dissimilar and distinct from an online world where different rules of engagement will apply. Perhaps we will simply become more comfortable with a more malleable definition of “truth”. Who knows? In thinking through how we might cope we shouldn’t underestimate the human capacity to adapt — ironically, the very attribute of ourselves that AI will struggle to replicate for some time (if it ever does). We won’t wake up one morning and say to ourselves “Oh, I’ve just noticed, my trust has been eroded”. Without an excess of melodrama, social and cultural norms will gradually shift.
What happens to our sense of our selves when humans are no longer the sole purveyors of creativity?
Under human curation, AI can already reasonably claim to be able to enhance human creativity. If the insurmountables listed above are one by one overcome we may find ourselves no longer the only entities on the planet capable of creative works of art or thought. Worse than this, if one looks under the bonnet of, say, GPT-3 one finds that its notorious 175bn parameters belie the actual level of complexity of the model architecture. GPT-3 is not a hyper-bespoke agglomeration of numerous different, complex components performing different functions. Its vast number of parameters result from the repeated application of a relatively simple core module. Not so much a Heath Robinson machine, more akin to the successive application of magnifying glasses to text, each magnifying glass splicing and blowing up the patterns between words seen by the previous magnifying glass to ever finer levels of probing in search of clues to the most likely words that should follow. That such a simple architecture can come so close to human level writing is extraordinarily humbling.
In time we may be forced to ask ourselves just how much of human creativity is truly creative — in the sense of representing the first appearance of a genuinely novel artefact or thought vs. “merely” cross-referencing and cross-applying insights and patterns seen in one domain to another a la GPT-3. The things which seem to set us apart from other species — and which have allowed some of us to regard humanity as in some way exalted — may one by one fall away. I’m hopeful that rather than provoke some kind of existential crisis, we may instead come to appreciate and cherish ever more highly the simple fact of our existence.
Hopefully the above has provided a sense of where we are and where we might be headed in the realm of AI content generation. Above all, as we think about possible checks and balances let’s make sure we don’t throw the AI baby out with the bathwater. The capacity for as yet unforeseen beneficial use cases is I suspect staggering. I for one am looking forward to knowing that my simulacrum will continue to plague my offspring with sound fatherly advice long after I’m dead and buried (although I daresay they may find themselves dialling-down some of their father’s more endearing qualities).*
*Just to further prove that truth is stranger than fiction, I wrote this paragraph before reading that a Korean TV show has used AI to create new performances by popular artists who died young, with AI generated 3D holographic images and vocals (see here and here ).
For more big picture views on AI subscribe to the newsletter at https://www.theaigroupie.com .
M. Caldwell, J.T.A. Andrews, T. Tanay and L.D. Griffin. AI-enabled future crime. In Crime Science 9, 2020
Tom B. Brown et al., Language models are few-shot learners. arXiv:2005.14165, 2020
Ian J. Goodfellow et al., Generative Adversarial Networks. arXiv:1406.2661, 2014
T. Karras, S. Laine and T. Aila. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv:1812.04948, 2018
A. Brock, J. Donahue and K. Simonyan, Large scale GAN training for high fidelity natural image synthesis. arXiv:1809.11096, 2019
P. Isola et al., Image-to-image translation with conditional adversarial networks. arXiv:1611.07004, 2018
D.Bau et al., GAN dissection: visualizing and understanding generative adversarial networks. arXiv:1811.10597, 2018
C. Chan et al., Everybody dance now. arXiv:1808.07371, 2019
T-C. Wang et al., Few-shot video-to-video synthesis. arXiv:1910.12713, 2019
I. Sutskever, J. Martens, G. Hinton, Generating text with recurrent neural networks. ICML ’11: Proceedings of the 28th International Conference on Machine Learning, June 2011, pages 1017–1024
T. Mikolov et al. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781
N. Kobis and L. Mossink. Artificial Intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. arXiv:2005.09980

Read the rest of this article here