Monday, October 26, 2020
Oct. 26, 2020

Linkedin Pinterest

Could A.I. ‘un-redact’ the Mueller Report?

Camas tech firm puts artificial intelligence to the test, gets lesson on its strengths, limitations

By , Columbian business reporter
Published:

Attorney General William Barr significantly redacted former special counsel Robert Mueller’s 448-page report, including information about ongoing grand jury trials.

The redactions were no doubt a source of frustration for millions of Americans eager to learn the full scope of Mueller’s findings after investigating alleged links between Russia and President Trump’s election. And a public statement from Mueller himself earlier this week offered no revelations beyond what was already in the report. So unless Barr has a sudden and drastic change of opinion, there’s no way for the public to know what’s behind those blacked-out sections.

Or is there?

That was the premise of a meetup event hosted this week by Camas-based technology company Manceps, which brought together about two dozen programmers and software engineers for a high-concept challenge: Could an artificial intelligence program be trained to fill in the blanks and “un-redact” the Mueller report?

The answer is no — or at least, not in a way that would lead to a political revelation buried at the end of this story. No matter how good the algorithm, the original report isn’t available to check the results — and while it never hurts to ask, it seems safe to assume that Mueller isn’t going to offer corrections.

But it’s certainly an eye-catching premise. And Manceps CEO Al Kari said his real goal for the exercise was to spark a fresh round of brainstorming about the potential applications of artificial intelligence and machine learning.

Kari founded Manceps in 2017 after previously working at several big tech companies including Microsoft, Dell and software company Red Hat. His new company — whose name is based on the Latin word for customer — works with corporate clients to develop computer programs to automate processes and functions using machine learning.

In a hypothetical example, if a department store chain wanted to use the information from its sales receipts to develop a new system for more efficiently distributing products based on local demand, it would hire a company like Manceps to train an AI program to comb through the data and pick out relevant patterns.

Another big use for AI programs is natural language processing, Kari said, which enables digital assistants such as Alexa and Siri to parse human speech and generate conversational answers. That’s the side of the field that the company wanted to put to work on the Mueller report.

“It’s not about generating text in a ‘Mad Lib’ format,” Kari said, referring to the popular word game. “We now (can train AI to) understand what’s behind the text.”

AI was immediately used after the report’s release, Kari pointed out. The public version was formatted as an image rather than a typed document, making it impossible to select or copy text. News outlets and other groups used AI programs to quickly analyze the images and pick out the text rather than re-typing the whole report by hand.

“Computer vision came to the rescue,” Kari says.

An educated guessing game

But at Wednesday’s demonstration, Kari and Manceps senior NLP architect Hobson Lane tried to take it a step further.

Language processing programs create new sentences by analyzing existing pieces of writing to see how often words appear next to each other, and then using those models to predict the next word in a sentence based on the words that came before.

So would AI be able to guess the redacted words based on the context?

Kari and Lane tested the theory by deploying a version of a Google-developed, open-source AI training program called BERT (Bidirectional Encoder Representation from Transformers), which can draw on cloud computing power to quickly crunch its way through the massive amount of data needed to train a language AI.

After running the report through several times, they picked a snippet of text with a redacted portion and told their new program to take its best guess.

The snippet read “… the pertinent activities of Michael Cohen, Richard Gates …” followed by a small block of redacted text.

On its first try, the program filled in the blank with “in the order that the president was aware.”

That’s a valid word construction that sounds like the writing style of the rest of the report, Kari pointed out, but it’s definitely wrong — especially because the sentence context and the size of the redaction block suggest that the redacted portion is a third name.

The program hadn’t been given enough information to understand that context, Kari said, so for the second attempt they tweaked the settings to focus on predicting names. This time, it filled in the blank with “Kislyak” — the last name of Sergey Kislyak, the former Russian ambassador to the United States.

That still doesn’t mean anything, Kari was quick to stress — it’s little more than a random guess based on the frequency with which various names appear elsewhere in the report, and the actual name behind the redacted block might not even appear anywhere in the unredacted parts of the report.

But it’s also based on just a few minutes of analysis with only the report itself as a starting point, he said.

Any AI’s key limitation

From there, the meetup turned into a discussion about the ways an AI could be better tuned to the task, such as feeding it more data — other documents relating to the Mueller probe, or other writing samples from Mueller and his team, or even unrelated legal documents that could familiarize the AI with legal writing styles.

Kari ended the meetup by turning the task over to the assembled group of software engineers, challenging them to take their best shot at building a more complete version of the evening’s analysis.

But when it comes to the actual redacted portions, Kari and Lane stressed, any AI will still run into a key limitation: It won’t be able to correctly guess redacted names, dates or numbers, because those are real-world facts whose values don’t correlate with the other words in the document.

“(The AI) can’t know anything about the world in general,” Lane said.

So it appears that if the redacted portions of the report ever see the light of day, it’ll have to be through more conventional means.

Loading...