[1910.11797v1] Deep Reinforcement Learning in HOL4
The tasks that use supervised learning were used to evaluate the performance of our TNN implementation and to compare it with other machine learning predictors
Abstract. The paper describes an implementation of deep reinforcement learning through self-supervised learning within the proof assistant HOL4. A close interaction between the machine learning modules and the HOL4 library is achieved by the choice of tree neural networks (TNNs) as machine learning models and the internal use of HOL4 terms to represent tree structures of TNNs. Recursive improvement is possible when a given task is expressed as a search problem. In this case, a Monte Carlo Tree Search (MCTS) algorithm guided by a TNN can be used to explore the search space and produce better examples for training the next TNN. As an illustration, tasks over propositional and arithmetical terms, representative of fundamental theorem proving techniques, are specified and learned: truth estimation, end-to-end computation, term rewriting and term synthesis.
‹Fig. 2: Search tree for the rewriting problem with starting state tag(1 × 1). (Term Rewriting)Fig. 3: Search tree for synthesis task from the starting state t0 = X (Term Synthesis)›