You are more or less right. By “mathematical approaches”, we mean approaches focused on building mathematical models relevant to alignment/agency/learning and finding non-trivial theorems (or at least conjectures) about these models. I’m not sure what the word “but” is doing in “but you mention RL”: there is a rich literature of mathematical inquiry into RL. For a few examples, see everything under the bullet “reinforcement learning theory” in the LTA reading list.

Thanks for the pointer! Yes RL has a lot of research of this kind—as an empirical research I just get stuck sometimes in translation