Digging into ChatGPT and Large Language Models (LLMs) is leading me down a rabbit-hole. In my last post, I looked at a paper discussing whether such models ‘understand’ language, and if so, how different might it be from how humans learn. The crux was whether statistical correlations are sufficient mimics that can, for practical considerations, substitute for knowing causal mechanisms. We don’t really understand what these LLMs are doing when they seemingly come up with novel and surprising responses, which they have not been ‘trained’ for (kinda, sorta). It’s a black box even to those who developed such AIs.
Today, I’m looking at a different paper with a catchy title: “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” The article is open-access at the Association for Computing Machinery digital library linked here. It’s a little more technical, and assumes some familiarity with LLMs. Since this article is from 2021, it does not include the latest version of ChatGPT or some of the newer-released chatbots. But they have more than enough data to make the key arguments in their paper. The background information provided in the paper was very useful for a novice like me to get a brief history of developing such models and a brief overview of the different LLMs out there. There’s a bunch of them!
Whenever you query ChatGPT, it takes up energy and compute time. How much energy? There are various estimates on the internet and the kWh per query number may look small, but a quick calculation will show that it is 2-3 orders of magnitude (i.e., 100-1000 times) more than the human brain. LLMs are energy guzzlers compared to us. This doesn’t account for the amount of energy used to train the LLMs in the first place to get them to the user-friendly stage. The bigger the LLM, the more energy it guzzles.
Reading about the
training data was an eye-opener. I’m a computational chemist so I’m familiar
with using data to train computational models. An important thing to keep in
mind is GIGO: Garbage In, Garbage Out. How well your program will perform its specific
task depends on the quality of your training data. Where does the training data
come from for something like ChatGPT? It’s not just the things you might expect
like Wikipedia, digitized books, and vetted data repositories. Turns out there
are very large datasets known as Common Crawl. Sources include “scraping outbound
links from Reddit” and other user-generated content such as Twitter. There are
biases in those data sets in terms of who the users are and what they might
discuss on the internet. We shouldn’t be surprised at the misogyny that shows
up from chatbots. The article authors provide numerous examples of why these
issues crop up. I recommend reading their article in full.
Given that the amount of garbage on the internet is increasing exponentially faster than anything else, the worse things get. Over the years, I’ve noticed more wrong things (related to chemical data) show up in the top few hits on a standard Google search. Search has in fact become more tedious when I’m looking for good data on specific things I care about. For those researchers who work in Natural Language Processing and are trying to move into Natural Language Understanding, the approach of just throwing more scraped data is likely going to make things harder especially when resources are being thrown at these LLM approaches. Is Big Data the answer? The authors make the point that “[language] coherence is in the eye of the beholder… human communication relies on the interpretation of implicit meaning conveyed between individuals… [it] is a jointly constructed activity… even when we don’t know the person who generated the language we are interpreting, we… [intuit] what common ground we think they share with us, and use this in interpreting their words.”
The authors refer to LLMs as Stochastic Parrots. This is an apt name given what these models are doing by applying statistics and Bayesian probabilities to constructing plausible-sounding text. We should think carefully about taking advice from these parrots, and certainly double-check what they’re telling us. But we’re lazy. Why do the hard work that you can outsource to a machine? Isn’t it just a useful tool? Like a calculator? Well, partly yes, but every year I see students punch their calculators and write nonsense answers. You could call it user error, but it’s an error often associated with not understanding how the calculator parses the input you provide to generate the output. I could make an analogy to the workings of a chatbot. Except that things are even fuzzier. There’s now a stochastic wrench thrown into the works. But all that being said, I’m still finding the stochastic parrot interesting to interact with – it gives me a sense of what an alien intelligence might be like.
No comments:
Post a Comment