AI Research Highlights | Week 49, 2023
Updated: Feb 19, 2024
1. FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models
Source: https://arxiv.org/abs/2311.15614
In this paper, the authors proposed a novel collaborative learning framework called FreeAL to employ the LLMs as active annotators and the SLMs as weak filters to interactively distill the taskrelated knowledge from the LLMs. FreeAL largely improves the unsupervised learning performance for both the LLMs and the SLMs, even approaching the supervised counterparts in some scenarios. These results proved the feasibility of human-free active labeling in the era of LLMs. The code will be uploaded soon here.
2. RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Source: https://arxiv.org/abs/2311.15649
The authors proposed a LLMs-based planner, RoboGPT, with the new robotic dataset, which combines the LLMs’ common sense with the robotics expertise knowledge. After fine-tuning on the new dataset, RoboGPT with strong generalization can plan hundreds of daily work tasks (even finding invisible objects in containers) and replan based on the environment, surpassing ChatGPT and other planning methods. A Re-Plan module is also developed with low computational needs to enable the planning process to dynamically adapt to the environment, addressing the nomenclature diversity difficulty in instruction tasks. Re-Plan module receives more precise environmental information provided by the perception model integrated Fast SAM provides, improving the task execution success rate on ALFRED tasks. The RoboGPT agent demonstrates superior performance compared to the state-of-the-art (SOTA) models on both the ALFRED benchmark and tasks involving generalization.
3. Unlearning via Sparse Representations
Source: https://arxiv.org/abs/2311.15268
In this work, researchers proposed a new approach to unlearning that requires negligible additional computation in order to unlearn a subset of data. This approach is based on the use of a discrete architectural bottleneck which induces sparse representations. These sparse representations facilitate unlearning a subset of data from the model with minimal to no performance drop on the rest of the data. They focused on the setting of class unlearning and our experiments show that the proposed approach, while being compute efficient, performs competitively with or in some cases better than a state-of-the-art approach which requires additional compute to perform unlearning. They found that the proposed approach did not benefit from such retraining, indicative of the fact that knowledge is highly localized in the Key-Value Bottleneck. Consequently, excising the activated key-value pairs from the model is a highly effective means of unlearning the forget set without disrupting the retain set.
4. Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach
Source: https://arxiv.org/abs/2311.13884
In this paper, the authors presentd LLaMAC, a novel framework for achieving a comprehensive decision-making process in collaborative tasks involving large-scale LLM-based agents. Within this framework, a centralized critic takes on the role of a coordinator, making suggestions to each actor based on their decision memory. Subsequently, the actors interact with the environment, receive assigned tasks, and execute corresponding actions. The authors introduced a TripletCritic structure, which coordinates multiple critics with the same objective but different preferences, to tackle the exploration-exploitation trade-off inherent in the decision-making process. They also established a comprehensive feedback mechanism that incorporates both internal feedback within the TripletCritic and external feedback between the LLM-based actors (i.e., agents) and the TripletCritic. This mechanism aims to mitigate hallucination issues and bolster the robustness of LLM decision-making. Lastly, they incorporated mechanisms such as redundant memory information deletion and external feedback judgment to construct a token-efficient planning framework.
5. Universal Self-Consistency for Large Language Model Generation
Source: https://arxiv.org/abs/2311.17311
In this paper, researchers presented the overall workflow of universal self-consistency (USC), which utilizes LLMs to enable self-consistency for a wide variety of tasks, especially free-form text generation. First, it samples multiple responses with the large language model. Afterward, to select one model response as the final answer, it concatenates all responses together, and then constructs a prompt with an instruction asking the language model to select the most consistent response. In this way, USC obviates the necessity of counting the exact answer frequency as in the standard self-consistency, and relies on the LLM’s own ability to measure the consistency among different responses.
6. Compositional Chain-of-Thought Prompting for Large Multimodal Models
Source: https://arxiv.org/abs/2311.17076
This paper proposed Compositional Chain-of-Thought (CCoT), a zero-shot Chain-of-Thought approach that utilizes scene graph representations in order to extract compositional knowledge out of an LMM. CCoT contains 2 steps: (1) generate a scene graph in order to circumvent the need for ground truth SG data by using the input image and task prompt (e.g., visual question); (2) prompt the LMM with the image, task prompt, and the generated scene graph to produce a response. Incorporating the scene graph in the prompt eliminates the need for fine-tuning and prevents forgetting. Experiment results showed improved performance on LLaVA-1.5 and Instruct-BLIP not only on VL compositional benchmarks like Winoground and WHOOPS! but also further enhances performance on general multimodal benchmarks like SEEDBench and MMBench, highlighting the effectiveness of this approach.
7. IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions
Source: https://arxiv.org/abs/2311.18397
In this paper, researchers from Huawei Poisson Lab proposed an Induction-Augmented Generation (IAG) framework that utilizes inductive knowledge along with the retrieved documents for implicit reasoning, improving the factuality of knowledge elicited from LLMs. They implemented (1) IAG-GPT which improves over strong baseline models and ChatGPT by leveraging the knowledge elicited from GPT-3 as auxiliary supporting evidence for the generator; (2) IAG-Student which gets rid of dependencies on GPT service at inference time by incorporating a student inductor model, outperforming RAG baselines under a small model size.
8. Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
Source: https://arxiv.org/abs/2312.00763
The researchers introduced ExploreLLM that allows users to structure thoughts, help explore different options, navigate through the choices and recommendations, and to more easily steer models to generate more personalized responses. The ExploreLLM system has an underlying tree-like data structure. Unlike traditional chatbots, researchers created an abstraction of a node that can be nested. A node is a unit of interaction and can represent different forms of interactions (e.g., multi-turn natural language chats or UI interfaces). By default, a new node is created when the user starts interacting with the system. When needed, the system automatically creates children nodes for users to explore through task decomposition. At any given node, the users can provide free-form context that the system should be aware of for better personalization. The personal preference context is shared globally across all nodes. In each node, the system will take personal preferences into consideration and dispatch a backend call to present some options for the user to choose from. Users can interact with different options via a checkbox UI to indicate their preferences. After sufficient exploration, users may want to tie everything back together and get a summary of their journey so far. Therefore, the system has a “summarize” function that is available on each page. Users can click on the button to exit to the root node and get a text summary of their entire interaction across the system.
9. TaskWeaver: A Code-First Agent Framework
Source: https://arxiv.org/abs/2311.17541
This paper proposed TaskWeaver, a code-first framework for building LLM-powered autonomous agents. The standout feature of TaskWeaver is its ability to convert each user request into executable code, treating user-defined plugins as callable functions. TaskWeaver overcomes the limitations of existing frameworks by providing support for rich data structures, flexible plugin usage, and dynamic plugin selection. It leverages the coding capability of LLMs to implement complex logic and incorporates domain-specific knowledge through examples. Additionally, TaskWeaver has made considerable efforts towards the secure execution of generated code and provides an easy-to-use interface for developers.
*The researchers behind the publications deserve full credit for their work.