multi-agent-paper/gRPC_Based_Interface/docs/references/Wu2023_AutoGen_MultiAgent.pdf

AutoGen: Enabling Next-Gen LLM
Applications via Multi-Agent Conversation
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li
Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu
Ahmed Awadallah, Ryen W. White, Doug Burger, Chi Wang
*Microsoft Research, Pennsylvania State University
University of Washington, Xidian University

NOTE: This file contains the text content extracted from the PDF at:
https://arxiv.org/pdf/2308.08155
arXiv:2308.08155v2 [cs.AI] 3 Oct 2023

Abstract
AutoGen is an open-source framework that allows developers to build LLM applications
via multiple agents that can converse with each other to accomplish tasks. AutoGen agents
are customizable, conversable, and can operate in various modes that employ combinations
of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define
agent interaction behaviors. Both natural language and computer code can be used to
program flexible conversation patterns for different applications. AutoGen serves as a
generic framework for building diverse applications of various complexities and LLM
capacities. Empirical studies demonstrate the effectiveness of the framework in many
example applications, with domains ranging from mathematics, coding, question answering,
operations research, online decision-making, entertainment, etc.

GitHub: https://github.com/microsoft/autogen

1 Introduction

Large language models (LLMs) are becoming a crucial building block in developing powerful
agents that utilize LLMs for reasoning, tool usage, and adapting to new observations in many
real-world tasks. Given the expanding tasks that could benefit from LLMs and the growing task
complexity, an intuitive approach to scale up the power of agents is to use multiple agents that
cooperate.

Our insight is to use multi-agent conversations to achieve it. There are at least three reasons:
1. Chat-optimized LLMs (e.g., GPT-4) show the ability to incorporate feedback; LLM agents
   can cooperate through conversations with each other or human(s).
2. A single LLM can exhibit a broad range of capabilities; conversations between differently
   configured agents can combine these broad LLM capabilities in a modular fashion.
3. LLMs can solve complex tasks when broken into simpler subtasks; multi-agent conversations
   can enable this partitioning and integration.

We present AutoGen, a generalized multi-agent conversation framework, based on two new concepts:

1. **Customizable and conversable agents.** AutoGen uses a generic design of agents that can
   leverage LLMs, human inputs, tools, or a combination. Developers can easily create agents
   with different roles by selecting and configuring built-in capabilities.

2. **Conversation programming.** AutoGen simplifies and unifies complex LLM application
   workflows as multi-agent conversations. It streamlines development via: (1) defining a set
   of conversable agents with specific capabilities and roles; (2) programming the interaction
   behavior between agents via conversation-centric computation and control.

2 The AutoGen Framework

2.1 Conversable Agents

In AutoGen, a conversable agent is an entity with a specific role that can pass messages to
send and receive information to and from other conversable agents. It maintains its internal
context based on sent and received messages and can be configured to possess a set of capabilities.

Agent capabilities powered by LLMs, humans, and tools:
- **LLMs**: LLM-backed agents exploit capabilities such as role playing, implicit state inference,
  providing feedback, adapting from feedback, and coding.
- **Humans**: AutoGen lets a human participate in agent conversation via human-backed agents.
- **Tools**: Tool-backed agents execute tools via code execution or function execution.

Built-in agent classes:
- **ConversableAgent**: highest-level agent abstraction, can use LLMs, humans, and tools
- **AssistantAgent**: pre-configured for AI assistant role (backed by LLMs)
- **UserProxyAgent**: solicits human input or executes code/function calls
- **GroupChatManager**: manages dynamic group chat among multiple agents

2.2 Conversation Programming

AutoGen utilizes conversation programming with two concepts:
1. **Computation**: the actions agents take to compute their response
2. **Control flow**: the sequence (or conditions) under which these computations happen

Design patterns:
1. **Unified interfaces and auto-reply mechanisms**: Once an agent receives a message, it
   automatically invokes generate_reply and sends the reply back unless a termination
   condition is satisfied.
2. **Control by fusion of programming and natural language**: (a) Natural-language control
   via LLMs; (b) Programming-language control via Python; (c) Control transition between
   natural and programming language.

3 Applications of AutoGen

Six applications demonstrating AutoGen's capabilities:

**A1: Math Problem Solving**
- System using built-in agents achieves 69.48% accuracy on MATH dataset (vs 55.18% for GPT-4 alone)
- Outperforms ChatGPT+Code Interpreter (48.33%), ChatGPT+Plugin (45.0%)
- Supports human-in-the-loop and multi-user problem solving

**A2: Retrieval-Augmented Code Generation and Question Answering**
- Retrieval-augmented Chat with interactive retrieval feature
- F1: 25.88%, Recall: 66.65% (vs DPR: 22.79%/62.59%)
- Novel interactive retrieval: agent replies "UPDATE CONTEXT" to get more relevant chunks

**A3: Decision Making in Text World Environments (ALFWorld)**
- Two-agent system matches ReAct performance (54%)
- Three-agent system with grounding agent: 69% success (vs 54% for ReAct)
- Grounding agent provides commonsense knowledge to prevent error loops

**A4: Multi-Agent Coding (OptiGuide)**
- Commander + Writer + Safeguard architecture
- Core workflow code reduced from 430+ lines to 100 lines (4x reduction)
- Multi-agent design boosts F-1 by 8% (GPT-4) and 35% (GPT-3.5) on unsafe code detection
- 3x time saving vs ChatGPT+Code Interpreter; 3-5x fewer user interactions

**A5: Dynamic Group Chat**
- GroupChatManager dynamically selects speakers
- Role-play prompt outperforms task-based prompt for speaker selection
- Higher success rate, fewer LLM calls, fewer termination failures

**A6: Conversational Chess**
- Human/AI chess players with a board agent for move validation
- Supports AI-AI, AI-human, human-human modes
- Board agent ensures legal moves; without it, illegal moves cause game disruptions

4 Discussion

AutoGen benefits:
- **Ease of use**: Built-in agents deliver strong performance out-of-the-box
- **Modularity**: Each agent can be developed, tested, and maintained independently
- **Programmability**: Core workflow code in A4 reduced 4x
- **Human involvement**: Native mechanism for human participation and oversight
- **Collaborative/adversarial agent interactions**: Agents share information or work adversarially

Comparison with related systems:

| Aspect | AutoGen | Multi-agent Debate | CAMEL | BabyAGI | MetaGPT |
|--------|---------|-------------------|-------|---------|---------|
| Infrastructure | yes | no | yes | no | no |
| Conversation pattern | flexible | static | static | static | static |
| Execution-capable | yes | no | no | no | yes |
| Human involvement | chat/skip | no | no | no | no |

Future directions:
- Designing optimal multi-agent workflows
- Creating highly capable agents with diverse skill sets
- Enabling scale, safety, and human agency
- Building fail-safes against cascading failures

References (selected):
- Wu et al. (2023). An empirical study on challenging math problem solving with GPT-4. arXiv:2306.01337
- Du et al. (2023). Improving factuality and reasoning in language models through multiagent debate. arXiv:2305.14325
- Liang et al. (2023). Encouraging divergent thinking in large language models through multi-agent debate.
- Hong et al. (2023). MetaGPT: Meta programming for multi-agent collaborative framework. arXiv:2308.00352
- Li et al. (2023b). CAMEL: Communicative agents for "mind" exploration of large scale language model society.
- Yao et al. (2022). ReAct: Synergizing reasoning and acting in language models. arXiv:2210.03629
- Shridhar et al. (2021). ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. ICLR 2021.

Appendix C: Default System Message for Assistant Agent

"You are a helpful AI assistant. Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script
(in a sh coding block) for the user to execute:
1. When you need to collect info, use the code to output the info you need...
2. When you need to perform some task with code, use the code to perform the task...
...
Reply 'TERMINATE' in the end when everything is done."