Boyan J.A. Learning Evaluation Functions for Global Optimization

Файл формата pdf
размером 1,66 МБ

Добавлен пользователем Shushimora 23.01.2012 19:38
Описание отредактировано 25.01.2012 12:38

Boyan J.A. Learning Evaluation Functions for Global Optimization

Диссертация, Carnegie Mellon University, 1998, -216 pp.

In complex sequential decision problems such as scheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The reinforcement-learning methodology of \value function approximation" (VFA) offers an alternative: systems can learn effective decision policies autonomously, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not. This thesis advances the state of the art in VFA in two ways.
First, it presents three new VFA algorithms, which apply to three different restricted classes of sequential decision problems: Grow-Support for deterministic problems, ROUT for acyclic stochastic problems, and Least-Squares TD(λ) for fixed-policy prediction problems. Each is designed to gain robustness and efficiency over current approaches by exploiting the restricted problem structure to which it applies.
Second, it introduces STAGE, a new search algorithm for general combinatorial optimization tasks. STAGE learns a problem-specific heuristic evaluation function as it searches. The heuristic is trained by supervised linear regression or Least-Squares TD(λ) to predict, from features of states along the search trajectory, how well a fast local search method such as hillclimbing will perform starting from each state. Search proceeds by alternating between two stages: performing the fast search to gather new training data, and following the learned heuristic to identify a promising new start state.
STAGE has produced good results (in some cases, the best results known) on a variety of combinatorial optimization domains, including VLSI channel routing, Bayes net structure-finding, bin-packing, Boolean satisfiability, radiotherapy treatment planning, and geographic cartogram design. This thesis describes the results in detail, analyzes the reasons for and conditions of STAGE's success, and places STAGE in the context of four decades of research in local search and evaluation function learning. It provides strong evidence that reinforcement learning methods can be efficient and effective on large-scale decision problems.

Learning Evaluation Functions for Sequential Decision Making
Learning Evaluation Functions for Global Optimization
STAGE: Empirical Results
STAGE: Analysis
STAGE: Extensions
Related Work
Conclusions
A Proofs
B Simulated Annealing
C Implementation Details of Problem Instances