Betreuer/in: Sommer

The recent advancements in the field of artificial intelligence have led to significant
breakthroughs across a wide range of domains. Recent works leverage the capabilities of Large Language Models (LLMs) to construct world-model-based agents that
generate hypotheses about their environment based on past observations.
Other approaches focus on the In-Context Learning (ICL) abilities of LLM agents, enabling them to improve their performance across episodes. In this paper, we combine these two lines of research into a unified self-evolving vertical multi-agent framework that learns from past experiences and use this knowledge to guide future decisions in uncertain environments. We introduce a playbook-based approach that stores past experiences and makes them accessible for future decisionmaking. This approach is further enhanced through a separation of concerns into two distinct components: a hypothesis playbook, which stores the agent’s beliefs about environment dynamics, and a strategy playbook, which contains strategies for effectively navigating the environment. These playbooks are utilized during a lookahead search in a shadow environment to evaluate the potential outcomes of different actions and support informed decision-making. The shadow environment is a world-model simulated by an agent using the hypotheses about the underlying environment dynamics.
To evaluate our approach, we apply the framework to the Frozen Lake environment with varying map sizes, as well as slippery and non-slippery configurations. As underlying LLMs, we use the gpt-oss-120b and grok-4.1-fast models and compare their performance. The results demonstrate that the proposed framework enables
effective learning across episodes, leading to improved success rates in navigating the environment. While the gpt-oss-120b model exhibits stronger initial performance, the grok-4.1-fast model achieves a higher learning rate.
Room 04.137, Martensstr. 3, Erlangen
or
Zoom-Meeting:
https://fau.zoom-x.de/j/68350702053?pwd=UkF3aXY0QUdjeSsyR0tyRWtLQ0hYUT09
Meeting-ID: 683 5070 2053
Kenncode: 647333