Ordering Subgoals in a Backward Chaining Prover

Kristóf Szabó (Eötvös Loránd University)
Zsolt Zombori (Rényi AI)
5-11 September 2021

World models represent the basic mechanisms of a system and can provide predictions about how transformations (actions) affect the state of the system. Such models have recently gained attention in Reinforcement Learning (RL) and in several domains model based learning systems performed similarly or better than highly tuned model free variants [1, 8, 12]. World models can increase sample efficiency since trajectories can be generated without interacting with the environment, and they can aid exploration by yielding a semantically meaningful latent structure that allows for identifying promising directions. Our project raises the question whether such world models are achievable for theorem proving, i.e. whether state-of-the-art machine learning toolset is capable of capturing the underlying dynamics of an Automated Theorem Proving (ATP) system. A world model for an ATP system should know what moves are valid and when the proof search failed or succeeded.