AI Research Highlights | Week 41, 2023
Updated: Feb 20, 2024
1. Language Models Represent Space and Time
Source: https://arxiv.org/abs/2310.02207
Linguists once pointed out that LLMs were just stochastic parrots, however, this paper proved that an LLM is not just a collection of superficial statistics, but a world model that truly learns structured knowledge of fundamental dimensions. A group of MIT researchers found that LLM learns linear representations of space and time at multiple scales and is highly robust to changes in prompts. The results also showed that LLM has independent "space neurons" and "time neurons" that can effectively encode space and time coordinates.
2. Working Memory Capacity of ChatGPT: An Empirical Study
Source: https://arxiv.org/abs/2305.03731
Researchers from the University of Oxford and Yale University conducted N-back tasks, a commonly used paradigm to test working memory, on ChatGPT. The results showed that ChatGPT has a working memory capacity limit similar to humans.
3. How FaR Are Large Language Models From Agents with Theory-of-Mind?
Source: https://arxiv.org/abs/2310.03051
The authors proposed Thinking for Doing (T4D), an evaluation paradigm that requires models to connect inferences about others' mental states to actions, and presented Foresee and Reflect (FaR), a 0-shot reasoning mechanism that encourages LLMs to anticipate future challenges and reasons for potential actions. FaR boosts GPT-4’s performance from 50% to 71% on T4D. The pdf is currently missing, you can find the file here.
4. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Source: https://arxiv.org/abs/2310.03214
This paper provided a new method for modern LLMs to adapt to ever-changing world knowledge with search engine augmentation. The authors proposed a dynamic QA benchmark called FreshQA and a simple few-shot in-context learning algorithm called FreshPrompt. With these developments, LLM can be evaluated and improved accordingly. The project can be found here.
5. Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Source: https://arxiv.org/abs/2310.03710
In this paper, researchers build an autonomous agent to improve the 0-shot reasoning abilities of LLMs. The agent generates task-specific instructions, which are used to guide LLMs to reason better, for a wide set of tasks. The results confirm the overall efficacy of our approach, showing significant improvements across various tasks.
6. Representation Engineering: A Top-Down Approach to AI Transparency
Source: https://arxiv.org/abs/2310.01405
This paper draws inspiration from cognitive neuroscience and proposes Representation Engineering (RepE), a top-down method for decoding LLM "blackbox" in the hope of enhancing AI transparency and AI safety. The researchers designed a scan method called LAT, which is similar to brain scans like PET and fMRI, to observe LLMs' internal activities when faced with truth and lies. This project can be found here.
7. Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models
Source: https://arxiv.org/abs/2310.03965
Thought Propagation (TP) was introduced in this paper by researchers from CAS and Yale. TP explores the analogous problems that are related to the input problems and leverages their solutions to enhance the complex reasoning ability of LLMs. TP also allows plug-and-play generalization and enhancement in a wide range of tasks without much labor in task-specific prompt engineering.
8. Large Language Models as Analogical Reasoners
Source: https://arxiv.org/abs/2310.01714
This paper introduced a new prompting approach called analogical prompting, allowing LLMs to self-generate relevant exemplars or knowledge in the context. Analogical prompting outperforms 0-shot CoT and manual few-shot CoT in many tasks such as GSM8K, Codeforces, and BIG-Bench.
9. Think before you speak: Training Language Models With Pause Tokens
Source: https://arxiv.org/abs/2310.02226
This paper proposed training language models with "pause" tokens to allow more computation before generating each token. The pause token <pause> is appended to the input sequence during training and inference. After training with <pause> tokens both during pretraining and downstream finetuning, the performance of LLM is improved on various tasks. This work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm.
*The researchers behind the publications deserve full credit for their work.