Tech Rundown: Fun with Language Maps
Today's Tech Rundown is about the Large Language Models that underpin chatbots like GPT and Bard, and three interesting ways people are using these detailed maps of the English language.
While we're waiting for Arnie and Vera to telegraph home their next vacation diary installment, it seemed like a good week for a Tech Rundown. Don't worry, I'm not writing about chatbots again. I'm writing about...ok, guilty, I'm writing about chatbots again. Kind of. Today's Tech Rundown is about the Large Language Models that underpin chatbots like GPT and Bard, and three interesting ways people are using these detailed maps of the English language.
Contexto
Like Wordle, Contexto is a guess the word game. Each day there's a new secret word, and you're trying to guess it. You supply a word, and you see how close you came. Then you guess another and try to get closer. But, here, 'close' is determined by the similarity of meaning between your word and the secret word. Contexto knows where each word fits on its vast language map. It ranks the relatedness between its word and every other word. When you guess a word, Contexto tells you how many words in its dictionary are more closely associated with the secret word than yours is. For example, if Contexto gives your entered word a score of 1500, that means there are 1,499 other words more strongly related to the secret word. A score of 8 means you're very close—only seven words in the dictionary are more strongly associated with the secret word.
I've been a little obsessed with this game because it shows the narrowness of my thinking. Say by guessing I've learned that the secret word is warmly related to tomatoes, boots, and excrement. I'll exhaust myself trying all manner of words related to organic gardening, only to find out later that the secret word was "comedian," and I'd been inadvertently listing throwable objects. Other times I'll be baffled because two similar words get vastly different scores. "Trumpet" scores a 10, for instance, but "piano" scores a 10,000. Then the secret word turns out to be "swan," and, naturally, swans trumpet but they never piano. For a simple game, it shows how maddeningly hard it can be to see the forest in a bunch of trees.
Mental Evesdropping
Researchers at the University of Texas at Austin have been using a Large Language Model to make accurate guesses about the thoughts a subject is thinking, using only non-invasive MRI scanning.
MRI scans have been used in the past to guess what type of thinking a person is doing. Blood flow in the frontal lobe could suggest complicated reasoning, or blood flow in the amygdala might suggest heightened emotions, but MRI is too fuzzy to predict a person's specific thoughts, unless you're Alexander Huth, Jerry Tang, and colleagues. They've found that predictive language models can be trained to decode MRI images, and guess the words a person is hearing through a pair of headphones.
“It definitely doesn’t nail every word,” Huth told Science News. "But that doesn’t account for how it paraphrases things. It gets the ideas.” For example, when a subject was listening to the words, “I don’t have my driver’s license yet,” the decoder produced, “She has not even started to learn to drive yet.”
Suddenly 'mental privacy' sounds like it could be an important concept in the future, but we don't need to don our tin-foil hats just yet. Each 'decoder' has to be personalized to the individual subject, and the decoders don't work unless the subject is voluntarily participating. Science News reports that participants can thwart the evesdropping by "simply ignoring the story and thinking about animals, doing math problems or focusing on a different story." So that's a relief. Next time you get caught spacing off, you can tell your teacher, boss, or spouse that you're doing your mental autonomy exercises.
How we're learning to speak Whale
Yes, it's all whales, all the time around here.
One astouding thing we've learned from AI language mapping is that different languages have similar maps. Even if grammar and syntax are different, the maps have similar shapes, since we share a universal human experience. Language maps overlap so well, we can use them for unsupervised translation—translation with no shared rosetta stone. And now a group of marine scientists is aiming to find out if machine learning can translate whale speech. They're studying sperm whales, whose complex calls are surprisingly language-like. Sperm whale calves, like human babies, perfect their calls through years of practice, and become more adept over time.
With a network of recording buoys installed near Dominica, researchers with the CETI Project (https://www.projectceti.org/) are collecting underwater recordings, and using a neural network to identify and catalog the whale calls, noting the individual speaker and any context avaliable (food, predators, other whales in the area). When enough recordings have been collected, the researchers will feed the data into neural networks trained for language mapping. Then we'll finally get to learn who does the most cursing: whales or sailors.