AI Research Highlights | Week 44, 2023
Updated: Feb 20, 2024
1. Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Source: https://arxiv.org/abs/2310.15127
Researchers from Carnegie Mellon University introduced HELPER (Human-instructable Embodied Language Parsing via Evolving Routines), an embodied agent equipped with external memory of language-program pairs that parses free-form human-robot dialogue into action programs through retrieval-augmented LLM prompting. The project can be found here.
2. Meta-(out-of-context) learning in neural networks
Source: https://arxiv.org/abs/2310.15047
Researchers from Cambridge University established the existence of a phenomenon which is called meta-out-of-context learning (meta-OCL), showing that learning can lead LLMs to update their predictions more/less when they encounter an example whose features indicate it is reliable/unreliable, leading to improved generalization performance. The code can be found here.
3. Large Language Models can Share Images, Too!
Source: https://arxiv.org/abs/2310.14804
Researchers introduce a two-stage framework: (1) predicting all possible image-sharing turns and (2) generating image descriptions - to unlock the image-sharing capability of LLMs through in-context zero-shot learning. To elicit this image-sharing capability of LLM at each stage, they present a restriction-based prompt by adding a Restrictions token. The code and the dataset will be available soon at this URL.
4. A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Source: https://arxiv.org/abs/2310.16656
Google Research scientists proposed a new method called RECAP , that leverages automatic captioning to improve the quality of a T2I model in a substantial way horizontally, both in fidelity and semantics, measured on a set of 7 standard metrics as well as with human evaluations. They've also done an analysis showing how Alttext captions used by current training methods suffer from train-inference skew and lack in semantic details, and how different captions mitigate both issues.
5. AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
Source: https://arxiv.org/abs/2310.15479
UCLA researchers proposed AutoDiff, leveraging the power of diffusion model for generating synthetic tabular data. They identified 3 features as the main challenges in tabular data synthesis and showed how AutoDiff can address them. The results of experiments showed that AutoDiff adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. The code is available upon request and will be publicly released if paper is accepted.
6. Human-like systematic generalization through a meta-learning neural network
This paper introduced meta-learning for compositionality (MLC), an optimization procedure for encouraging systematicity through a series of few-shot compositional tasks. The results showed that neural networks can achieve human-like systematic generalization through MLC. The implementation of MLC in this paper uses only common neural networks without added symbolic machinery, and without hand-designed internal representations or inductive biases. Instead, MLC provides a means of specifying the desired behaviour through high-level guidance and/or direct human examples; a neural network is then asked to develop the right learning skills through meta-learning. Data can be found here.
7. Correction with Backtracking Reduces Hallucination in Summarization
Source: https://arxiv.org/abs/2310.16176
Researchers presented Correction with Backtracking (CoBa), an inference-time method that requires no additional models, to mitigate hallucination. CoBa detects hallucinations by using conditional probabilities of the generated tokens and measuring the distance between the generated text and the context. To correct the hallucinated text, it applies backtracking before the hallucination and re-generates text to avoid ending up in positions with only low scoring token options. CoBa was shown to produce more factual summaries for various datasets.
8. R3 Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context
Source: https://arxiv.org/abs/2310.16535
In this paper, researchers proposed a new method named R3 prompting, that includes sequentially interaction with LLMs to gradually approach final answers via a thought process of Reviewing, Rephrasing and Resolving. In the review stage, a review prompt is designed to extract key sentences that are essential conditions required for the final answer prediction. In the rephrase stage, LLMs are guided to reformulate the problem narratives to variables with the hint of extracted key sentences. In the resolve stage, LLMs predict the final answers taking account of the generated variables. R3 prompting was proved to outperform the previous baselines on 8 datasets.
9. How do Language Models Bind Entities in Context?
Source: https://arxiv.org/abs/2310.17191
Researchers from UC Berkeley analyzed LM representations and identified the binding ID mechanism for solving the binding problem. They found that pretrained LMs can solve the binding task by binding entities and attributes to abstract binding IDs. Then, researchers identified that the binding IDs are vectors from a binding subspace with a notion of distance. Lastly, they found that the binding IDs are used broadly for a variety of binding tasks and are present in all sufficiently large models.
10. PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
Source: https://arxiv.org/abs/2310.16427
In this paper, authors presented PromptAgent, an optimization method that autonomously crafts expert-level prompts. Expert-level prompting distinguishes itself from traditional prompt engineering by its effectiveness of seamlessly integrating domain insights and closing the knowledge gap for domain experts. PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by humanlike trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. PromptAgent outperforms strong Chain-of-Thought and recent prompt optimization baselines on 12 tasks.
*The researchers behind the publications deserve full credit for their work.