Colloquium lecture: Structured Search over Policy Classes: A Hierarchical Framework for Memory-Augmented Agentic Policy Search in Stochastic Environments, June 16, 2026, 10:15 a.m.

Tutor: Sommer

Überschrift

This paper extends the Agentic Policy Search (APS) framework proposed by Sommer
et al. in “Adaptive Self-Improvement for Smarter Energy Systems using Agentic
Policy Search”[1]. The main objective is to guide the policy search process in a more
interpretable and structured manner, particularly in highly stochastic environments.

While agentic policy search enables flexible and adaptive policy generation, it often lacks transparency and systematic guidance, making it difficult to understand, compare, and reliably improve discovered policies. This limitation becomes especially pronounced in volatile domains such as energy systems, where uncertainty, high variability, and transfer across environments require robust and explainable decision-making strategies. Addressing these challenges motivates the introduction of structured search over policy classes, enabling a more principled exploration of the policy space while improving interpretability and robustness. To this end, a hierarchical search approach over policy classes as defined by Warren Powell [2] is introduced. At the class level, a Knowledge Gradient (KG) strategy is employed to efficiently allocate search effort across different policy classes. Within each class, candidate policies are optimized using the Huxley–Gödel Machine (HGM) [3], enabling the identification of strong representative policies. In addition, the simulation environment is extended and reworked to improve the robustness of
generated policies and to facilitate transfer to a modified CityLearn environmen [4]. A dedicated parameter tuning phase is incorporated to further enhance policy performance, and a memory system is introduced to retain and reuse relevant information across the search process. Experimental results indicate that appropriately chosen policy classes yield promising performance in highly stochastic settings while providing improved interpretability of the underlying decision structure. However, the stochastic nature of large language models can lead to weak policies even within suitable classes. Moreover, since the proposed approach requires exploration across multiple policy
classes, several iterations may be necessary before identifying an effective policy, resulting in increased computational effort.

Room 04.137, Martensstr. 3, Erlangen

Zoom-Meeting:
https://fau.zoom-x.de/j/68350702053?pwd=UkF3aXY0QUdjeSsyR0tyRWtLQ0hYUT09

Meeting-ID: 683 5070 2053
Kenncode: 647333

Last update: 2026-05-11 - 11:33