AI Research Highlights | Week 51, 2023
Updated: Feb 19, 2024
1. Intelligent Virtual Assistants with LLM-based Process Automation
Source: https://arxiv.org/abs/2312.06677
This paper proposed LLM-Based Process Automation (LLMPA), containing modules for decomposing instructions, generating natural language descriptions, detecting interface elements, predicting next actions, and checking for errors. The system is demonstrated using the Alipay mobile payments app as a target environment. This work represents the first real-world deployment and extensive evaluation of a LLM-based virtual assistant in a widely used mobile application with an enormous user base.
2. KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
Source: https://arxiv.org/abs/2312.04889
In this paper, authors introduced KwaiAgents, a generalized information-seeking agent system leveraging with LLMs. KwaiAgents comprises three main components: (1) KAgentSys, an autonomous agent loop that integrates a memory bank, a tool library, a task planner, and a concluding module. (2) KAgentLMs, which are a suite of open-source LLMs continuously fine-tuned to enhance agent capabilities. (3) KAgentBench, a benchmark that assesses the performance of LLMs in responding to varied agent-system prompts across different capabilities.
3. CogAgent: A Visual Language Model for GUI Agents
Source: https://arxiv.org/abs/2312.08914
In this work, The authors presented CogAgent, a visual language foundation model specializing in GUI understanding and planning while maintaining a strong ability for general cross-modality tasks. By building upon CogVLM — a recent open-source VLM, CogAgent tackles the challenges of training data and high-resolution vs. compute for building GUI agents. The project can be found at: https://github.com/THUDM/CogVLM
4. Towards Verifiable Text Generation with Evolving Memory and Self-Reflection
Source: https://arxiv.org/abs/2312.09075
This paper introduced Vtg, a unified framework that guides the generation model using a combination of long short-term memory and a two-tier verifier, providing a self-reflective and nuanced approach for verifiable text generation. The implementation of long short-term memory effectively captures the most valuable and up-to-date documents, significantly addressing the focus-shifting issue. The active retrieval mechanism and integration of diverse query generation increase both the precision and scope of the retrieved documents. The two-tier verifier and evidence finder enable in-depth analysis of the relationship between generated sentences and potential evidence. The empirical results confirm that our method significantly outperforms existing baselines across various metrics.
5. LDM^2: A Large Decision Model Imitating Human Cognition with Dynamic Memory Enhancement
Source: https://arxiv.org/abs/2312.08402
Researchers proposed Large Decision Model with Memory (LDM^2), a framework that enhances the standard LLMs with dynamic updating memory. Differentiate from traditional imitation learning, LDM^2 is equipped with a dynamic memory refinement stage to enhance the memory with valuable state-action tuples. First, they conduct tree exploration to generate all potential decision processes and evaluate them according to the environment rewards. Then, they add the state-action tuple corresponding to the best decision process into the memory. This exploration-evaluation-adding circle mimics the traditional reinforcement learning framework. The refinement stage not only expands the action space of the LLMs, but also enable the LLMs to deal with new situations not covered by the initial memory.
6. ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Source: https://arxiv.org/abs/2312.10003
The researchers built a flavor of ReAct agent with self-critique for the task of long-form question answering. They defined a proxy evaluation metric for the agent based on Bamboogle and BamTwoogle datasets, with a strong emphasis on auto-eval and demonstrated that the performance of the agent could be effectively improved through Rest-style iterative fine-tuning on its reasoning traces. Furthermore, They did it purely from stepwise AI feedback without using human-labeled training data. Finally, they showed that the synthetic data produced as part of this iterative process could be used for distilling the agent into one or two orders of magnitude smaller models with performance comparable to the pre-trained teacher agent.
7. Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Source: https://arxiv.org/abs/2312.08901
In this work, researchers proposed CoT-Max, which pushes the boundaries of utilizing few-shot learning to improve LLM math reasoning capability. The central idea of CoT-Max is to input long lengthy CoT examples, identify the crucial examples for the target LLM, and then prune redundant tokens to fit within the original LLM context window. The results showed that CoT-Max significantly boost LLM reasoning capability, achieving 2.13%-4.55% absolute improvements over state-of-the-art baselines, and establishes a new prompting-based benchmark in math reasoning accuracy without any fine-tuning or additional inference costs.
8. LLMLingua: Innovating LLM efficiency with prompt compression
In this paper, authors proposed LLMLingua, which uses a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, to detect the unimportant tokens in the prompt and enable inference with the compressed prompt in black-box LLMs, achieving up to 20x compression with minimal performance loss. Results showed that GPT-4 can recovery all key information from the compressed prompt. The project can be found at: https://github.com/microsoft/LLMLingua
*The researchers behind the publications deserve full credit for their work.