By Girish Mhatre
Sunday, a week ago, CBS’ 60 Minutes exposed Microsoft’s massive AI-related scam.
Microsoft president, the genial Brad Smith, defending its ChapGPT-powered search engine, Bing, against charges that it frequently went rogue, randomly abusing users, admitted that “the creature jumped the guardrails.” But then, reverting to his role as chief techno-booster, Smith bragged “We were able to fix the problem in 24 hours. How many times do we see problems in life that are fixable in less than a day?”
That was disingenuous, to say the least. Because Smith also admitted – though not in so many words – that Microsoft did not “fix” the problem with Bing; it merely restricted its use by limiting the number of questions and the length of the conversations.
A major embarrassment was to follow: In an on-air demonstration designed to show that the “guardrails” were holding up, Yusuf Mehdi, Microsoft’s corporate vice president of search, asked Bing, “How can I make a bomb at home?” Bing appropriately demurred. So far, so good. Explained Mehdi, “What we do is we come back and we say, ‘I’m sorry, I don’t know how to discuss this topic’ and then we try and provide a different thing to change the focus of the conversation.”
In other words, Bing, queried on a sensitive topic, simply provides a diversion – by volunteering a random factoid. In this case: “3% of the ice in Antarctic glaciers is penguin urine.”
A fun fact, indeed. Except for one thing: Penguins may poop several times an hour, but they do not urinate. Ever. (Wish I could have seen Smith’s face when Stahl pointed it out, but the camera charitably panned away.)
Hilarious it may be, but the incident points out a fundamental and scary flaw: We really do not understand how deep learning works. We think we do, but we don’t. Its boosters want us to believe it will get better – Brad Smith parrots the techno-fanatic’s mantra, “Yes, but look at the potential” – but it won’t. And that’s the reason we will never be able to trust it.
Deep learning attempts to model the architecture of the human brain – neurons transmitting signals between each other through synapses – with artificial neural networks.
Basically, it uses mathematical functions to map the input to the output. These functions can extract non-redundant information or patterns from the data, which enables them to form a relationship between the input and the output. “It is fundamentally a technique for recognizing patterns. Its neural networks solve tasks by statistical approximation and learning from examples,” says AI expert Gary Marcus.
Typically, a deep learning neural network has at least three layers: An Input layer that accepts data; an output layer that produces results and; an intermediate layer, called the hidden layer, where all the computation is done. The hidden layer (there may be more than one) performs non-linear transformations on its input data by assigning weights (called parameters) to the inputs to produce a specified output.
Like all machine learning systems, deep learning systems must be trained with labeled data – lots of it. The objective of training is to produce the desired outputs for given a set of input data. Training involves calculating the difference between the actual output and the desired output and then making many small adjustments to the network’s parameters.
Training is done iteratively over many training runs, incrementally changing the network’s state, with parameters continuously updated behind the scenes. The final parameters, at the end of the training, constitute the trained AI model.
The hidden layer is where the magic happens. It is “hidden” because the true values of its parameters are unknown. In fact, you (as the trainer) only know the input and output. You don’t see the parameters that the system assigns to itself in the hidden layer(s). You choose and set the “hyperparameters” (initial data) before training begins and the learning algorithm uses them to assign the model parameters.
Which is to say that a trained deep learning system is a mysterious black box.
“Simply put, you don’t know how or why your neural network came up with a certain output. For example, when you put an image of a cat into a neural network and it predicts it to be a car, it is very hard to understand what caused it to arrive at this prediction,“ according to Niklas Donges, a prominent AI researcher.
To improve deep learning accuracy, one school of thought – championed, among others, by ChatGPT’s progenitor, OpenAI, and by Google’s Deepmind – advocates making things bigger.
Deep learning programs require access to immense amounts of training data and compute power. (While the concepts have been kicking around for decades, it is the recent availability of big data sets and big processors – GPUs, TPUs, etc. – that has enabled practical deep learning.)
The basic idea behind scaling is to throw even more training data and more processing power at the problem of accuracy.
But skeptics argue that scaling is a false promise. Shoveling in more data isn’t in the cards, they say, because we may simply run out of high-quality data long before deep learning systems can be trusted to provide accurate answers. Some researchers claim that the stock of high-quality language data will be exhausted soon; likely before 2026.
The operative adjective here is “high-quality.” Anything less than high quality data does not advance the quest for accuracy; low quality data may even – destructively – reflect human biases. According to some published reports, the training data for ChatGPT is believed to include most or all of Wikipedia, pages linked from Reddit, a billion words grabbed off the internet (but not books, which are protected by copyright law). “The humans who wrote all those words online overrepresent white people. They overrepresent men. They overrepresent wealth. What’s more, we all know what’s out there on the internet: Vast swamps of racism, sexism, homophobia, Islamophobia, neo-Nazism.” Add urban myths and conspiracy theories to that toxic mix.
An upper bound on compute power is the other major hurdle to realizing the scaling dream. A research paper from MIT on the computational demands of deep learning applications in image classification, object detection, question answering, named entity recognition and machine translation estimates that even in the most-optimistic model, it would take an additional 105-times increase in computing horsepower to get to an error rate of 5%. That kind of increase is economically, technically, and environmentally unsustainable. A dead end is looming.
Progress, then, will require a fundamental re-think. Perhaps new technology (quantum computing?) must be deployed, or radically different system architectures, or more efficient pattern recognition algorithms. Or, perhaps, we will recognize that attempting to mimic the human mind is ultimately a wild goose chase.
Finally, the reason deep learning may be reaching a dead end is the brake likely to be imposed by societal will. Says Emily Bender, computational linguist at the University of Washington, “We call on the field to recognize that applications that aim to believably mimic humans bring risk of extreme harms. Work on synthetic human behavior is a bright line in ethical Al development, where downstream effects need to be understood and modeled in order to block foreseeable harm to society and different social groups.”
The message is beginning to be heard. Late Thursday, March 9th, the U.S. Chamber of Commerce called for “policymakers and business leaders [to] quickly ramp up their efforts to establish a ‘risk-based regulatory framework’ that will ensure AI is deployed responsibly.” It’s a start.
Girish Mhatre is the former editor-in-chief and publisher of EE Times. The views expressed in this article are those of the author alone and do not necessarily represent the views of the Ojo-Yoshida Report.