AI research company, OpenAI, have created a text generator so effective they have withheld from the public the underlying research for fear of misuse.

OpenAI and GPT2

Elon Musk and Sam Altman launched OpenAI in December 2015 with a mission to create artificial general intelligence systems (AGI). AGI systems outperform humans in exercising intelligence across multiple domains and are capable of self-improvement. Within a year of launch, they released Gym, a tool kit for building systems that can learn how to play games, and win: remember AlphaGO?

In February 2019, OpenAI announced to the world a breakthrough in AGI, GPT2. GPT2 is a text generator that writes highly intelligible text when prompted. To create it, they fed it text, lots of text. Dario Amodei, their research director, says that GPT2’s models “were 12 times bigger, and the dataset was 15 times bigger and much broader” than previous AI models. The 40GB dataset included around 10m articles that had more than three votes on Reddit.

Existing text generators have many flaws. They invariably forget what it is they are writing about, they use strange syntax and the end product is often so bad it is funny and sometimes surreal.

GPT2 has raised the standard considerably. It picks up and runs with the sense and voice of a few lines, for instance the opening lines of George Orwell’s Nineteen Eighty-Four, – “It was a bright cold day in April, and the clocks were striking thirteen.” GPT2 continues, “I was in my car on my way to a new job in Seattle. I put the gas in, put the key in, and then I let it run. I just imagined what the day would be like. A hundred years from now. In 2045, I was a teacher in some school in a poor part of rural China. I started with Chinese history and history of science.”

Why the secrecy?

This technology can no doubt be put to good use. It can write, translate and summarise text with greater accuracy than previous systems. However, like Carl Denham returning from Skull Island, Musk has emerged from this process with his Kong caged. OpenAI released a paper outlining their research but withheld the full model and the millions of web pages used to train the system.

Transparency is in their name. OpenAI say their “Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world.” Why, then, have they not released all of the research behind GPT2?

Some have suggested this is a cynical PR attempt to hype up their research and which gives the AI industry a bad name.

However, given OpenAI’s mission and Charter, this seems unduly harsh. Their mission is clear for their systems to benefit all of humanity. Their Charter contains a commitment to avoid enabling uses of AI or AGI that harm humanity or unduly concentrate power.These are not mere platitudes. Almost a year to the day before presenting GPT2 to the world, OpenAI co-authored a paper on how to prepare for when malicious actors misuse technology such as this.

Speaking in relation to GPT2, Jack Clark, OpenAI’s head of policy told the Guardian, “We need to perform experimentation to find out what they can and can’t do”. “If you can’t anticipate all the abilities of a model, you have to prod it to see what it can do. There are many more people than us who are better at thinking what it can do maliciously.”

They have already made a version of GPT2 that generates positive or negative product reviews. And it is foreseeable that a system that feeds on Internet text will quickly become expert in the banalities of Internet communication: insult, bigotry, racism, conspiracy theories, etc. It could even lead the way in that most pernicious of all Internet communications, so called ‘Fake News’.

Experiments conducted by the Guardian and WIRED have shown how spookily accurate GPT2’s ‘news pieces’ can be. The Guardian fed it some paragraphs from this piece on Brexit and GPT2 wrote what could pass as credible among many of the public. It even generated ‘quotes’ from Jeremy Corbyn and the Prime Minister’s spokesman. (WIRED fed it “Hillary Clinton and George Soros”. What came out is worth reading, it is chillingly brilliant. Read it here.)

Conclusion

Their name might imply complete transparency, but their commitments to prevent harm, specifically the undue concentration of power, place important caveats to that transparency. What good is there in being clear where what you have to say might muddy things even further?

Other technology companies are aware of these risks and are exercising more care. Last month, Google released this paper in which it said it will limit what of its research software is shared, for fears of misuse. It would seem that technology companies have just discovered they are on Skull Island and are pondering whether and how to safely open the Island’s secrets to the rest of the world.