Semantics Club 30 11 2001

Learning in Finite Models

Martin Grohe

(joint work with Gyorgy Turan)


The general problem of concept learning is to identify an unknown set from a given class of sets. The given class, referred to as the concept class, represents the possible classifications of the elements of the universe, and the unknown set, also called the target concept, is the actual classification. We consider the setup of predicate logic learning in finite structures, where the universe is the set of elements (or, more generally, tuples of elements) of a finite relational structure, and concepts are classes of sets definable by predicate logic formulas.

The identification of the target concept can be exact, as for example in the model of learning with equivalence queries, or approximate, as in the model of probably approximately correct (PAC) learning. It is an interesting fact that several notions of learning complexity turn out to be closely related to certain combinatorial parameters of the concept class. In particular, the sample size for PAC learning is of the same order of magnitude as the Vapnik-Chervonenkis dimension of the concept class, and the query complexity of learning with equivalence queries is closely related to the strong consistency dimension of the concept class. While it is easy to see that these combinatorial parameters are unbounded for predicate logic learning on arbitrary finite structures, we will show that they can be bounded on certain restricted classes of structures such as trees or planar graphs, for the former even if concepts are defined by formulas of monadic second-order logic. These bounds imply positive learnability results for the PAC and equivalence query learnability of definable sets over these structures.