On the Intrinsic Complexity of Learning
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
On the Intrinsic Complexity of Learning
Rusins Freivalds Institute of Mathematics and Computer Science University of Latvia Raina bulvaris 29 LV-1459, Riga, Latvia
A new view of learning is presented. The basis of this view is a natural notion of reduction. We prove completeness and relative difculty results. An in nite hierarchy of intrinsically more and more di cult to learn concepts is presented. Our results indicate that the complexity notion captured by our new notion of reduction di ers dramatically from the traditional studies of the complexity of the algorithms performing learning tasks.
Abstract
2பைடு நூலகம்
1 Introduction
Traditional studies of inductive inference have focused on illuminating various strata of learnability based on varying the de nition of learnability. The research following the Valiant's PAC model Val84] and Angluin's teacher/learner model Ang88] paid very careful attention to calculating the complexity of the learning algorithm. We present a new view of learning, based on the notion of reduction, that captures a di erent perspective on learning complexity than all prior studies. Based on our prelimanary reports, Jain and Sharma JS94, JS95] adapted the de nitions given below to the case of learning languages. Herein, we present work on the intrinsic di culty of learning functions. It is best to proceed by an example. Consider the following algorithm for learning all the linear time computable functions. As a preliminary step, de ne an enumeration of all and only the linear time computable functions. The ith program in this enumeration is interpreted as an ordered pair (j; k), and runs the j Turing machine (from some standard list) on any input x for k x + k steps. The learning algorithm initially guesses the rst program in its enumeration and then starts reading data. When some data is input that disagrees with output of the current guess, the learning algorithm considers the next program in its list that does not contradict the input it has seen so far. A moments re ection is all that is necessary to realize that, given input from some linear time computable function, this procedure will, in the limit, converge to a correct program. The technique used is the above procedure is called the enumeration technique Gol67]. In contrast with the simple learning procedure described above, consider learning the exponential time computable functions. In fact, the same basic algorithm works. The only modi cation necessary is to change the simulation time bound to something like kx + k. This trivial modi cation leaves the essence of the algorithm unchanged. Our belief is that the intrinsic di culty of the algorithm has not changed. However, due to the cost of exponential simulations compared with linear simulations, the complexity of the algorithm above for learning the exponential time computable functions is much greater than its complexity when learning the linear time functions. From our new perspective, both learning tasks described above have the same complexity. We call this complexity intrinsic because it relates to the essence of the learning problem itself, and not the learning algorithm.
th
1
Another simple example serves to show that our reduction based approach to the complexity of learning is also orthogonal to the traditional strata of learnability studied in inductive inference. The two learnable classes of this example are the functions of nite support and the self describing functions. The functions of nite support take the value 0 on all but nitely many arguments. This class was rst used in Gol67]. The self describing functions return, on argument 0, a program computing themselves. There were rst used in Bar74, BB75]. Both of these classes (and their derivatives) are ubiquitous in the study of inductive inference. The functions of nite support are known to also be learnable under the more restrictive notion of Popperian learning CS83]. In fact, the self describing functions can be used as an example of something that is not in the more restrictive class containing the functions of nite support. To learn the self describing functions, all an algorithm need do is to wait for the proper, easily identi ed, value to appear as input. Learning all the functions of nite support with a single algorithm requires changing conjectures and using the limiting nature of learning. As a consequence of these observations, we believe that the self describing functions are intrinsically easier to learn than the functions of nite support, despite their relative positions in the traditional hierarchy of learnability. With every notion of reduction, comes a notion of completeness. It turns out the functions of nite support are complete, while the self describing functions are not. Of course, completeness is a relative notion. This point will be clari ed with the technical de nitions in the next section. The completeness of the class of functions of nite support suggests an interesting interpretation for machine learning. Traditionally, complete problems have been more than typical, they are considered to embody the crux of a class of problems. For example, the completeness of the halting set for the class of recursively enumerable sets indicates that any recursively enumerable set has a witnessing algorithm that proceeds by the simulation of selected computations. For the nonrecursive, r.e. sets, the nonrecursiveness emanates from the unsolvability of the halting problem. The completeness of satis ability problem for formulae in conjunctive normal form with respect to the NP sets gives another example. Here the suggestion is that any deterministic solution to any of the hard problems in NP will require some \backtracking." The completeness of the set of functions of nite support suggest that any hard learning problem will have the following characteristics. 2
smith@
April 3, 1995
This work was facilitated by an international agreement under NSF Grants 9119540 and 9421640. A preliminary version of this work appeared at the 1995 European Workshop on Computational Learning Theory. y Supported in part by NSF Grant 9020079. z Supported in part by NSF Grants 9020079 and 9301339.
rusins@cc.lu.lv
E m Kinbery Computer and Information Sciences Department University of Delaware Newark, DE 19716, USA
kinber@
Carl H. Smithz Department of Computer Science University of Maryland College Park, MD 20912 USA
Rusins Freivalds Institute of Mathematics and Computer Science University of Latvia Raina bulvaris 29 LV-1459, Riga, Latvia
A new view of learning is presented. The basis of this view is a natural notion of reduction. We prove completeness and relative difculty results. An in nite hierarchy of intrinsically more and more di cult to learn concepts is presented. Our results indicate that the complexity notion captured by our new notion of reduction di ers dramatically from the traditional studies of the complexity of the algorithms performing learning tasks.
Abstract
2பைடு நூலகம்
1 Introduction
Traditional studies of inductive inference have focused on illuminating various strata of learnability based on varying the de nition of learnability. The research following the Valiant's PAC model Val84] and Angluin's teacher/learner model Ang88] paid very careful attention to calculating the complexity of the learning algorithm. We present a new view of learning, based on the notion of reduction, that captures a di erent perspective on learning complexity than all prior studies. Based on our prelimanary reports, Jain and Sharma JS94, JS95] adapted the de nitions given below to the case of learning languages. Herein, we present work on the intrinsic di culty of learning functions. It is best to proceed by an example. Consider the following algorithm for learning all the linear time computable functions. As a preliminary step, de ne an enumeration of all and only the linear time computable functions. The ith program in this enumeration is interpreted as an ordered pair (j; k), and runs the j Turing machine (from some standard list) on any input x for k x + k steps. The learning algorithm initially guesses the rst program in its enumeration and then starts reading data. When some data is input that disagrees with output of the current guess, the learning algorithm considers the next program in its list that does not contradict the input it has seen so far. A moments re ection is all that is necessary to realize that, given input from some linear time computable function, this procedure will, in the limit, converge to a correct program. The technique used is the above procedure is called the enumeration technique Gol67]. In contrast with the simple learning procedure described above, consider learning the exponential time computable functions. In fact, the same basic algorithm works. The only modi cation necessary is to change the simulation time bound to something like kx + k. This trivial modi cation leaves the essence of the algorithm unchanged. Our belief is that the intrinsic di culty of the algorithm has not changed. However, due to the cost of exponential simulations compared with linear simulations, the complexity of the algorithm above for learning the exponential time computable functions is much greater than its complexity when learning the linear time functions. From our new perspective, both learning tasks described above have the same complexity. We call this complexity intrinsic because it relates to the essence of the learning problem itself, and not the learning algorithm.
th
1
Another simple example serves to show that our reduction based approach to the complexity of learning is also orthogonal to the traditional strata of learnability studied in inductive inference. The two learnable classes of this example are the functions of nite support and the self describing functions. The functions of nite support take the value 0 on all but nitely many arguments. This class was rst used in Gol67]. The self describing functions return, on argument 0, a program computing themselves. There were rst used in Bar74, BB75]. Both of these classes (and their derivatives) are ubiquitous in the study of inductive inference. The functions of nite support are known to also be learnable under the more restrictive notion of Popperian learning CS83]. In fact, the self describing functions can be used as an example of something that is not in the more restrictive class containing the functions of nite support. To learn the self describing functions, all an algorithm need do is to wait for the proper, easily identi ed, value to appear as input. Learning all the functions of nite support with a single algorithm requires changing conjectures and using the limiting nature of learning. As a consequence of these observations, we believe that the self describing functions are intrinsically easier to learn than the functions of nite support, despite their relative positions in the traditional hierarchy of learnability. With every notion of reduction, comes a notion of completeness. It turns out the functions of nite support are complete, while the self describing functions are not. Of course, completeness is a relative notion. This point will be clari ed with the technical de nitions in the next section. The completeness of the class of functions of nite support suggests an interesting interpretation for machine learning. Traditionally, complete problems have been more than typical, they are considered to embody the crux of a class of problems. For example, the completeness of the halting set for the class of recursively enumerable sets indicates that any recursively enumerable set has a witnessing algorithm that proceeds by the simulation of selected computations. For the nonrecursive, r.e. sets, the nonrecursiveness emanates from the unsolvability of the halting problem. The completeness of satis ability problem for formulae in conjunctive normal form with respect to the NP sets gives another example. Here the suggestion is that any deterministic solution to any of the hard problems in NP will require some \backtracking." The completeness of the set of functions of nite support suggest that any hard learning problem will have the following characteristics. 2
smith@
April 3, 1995
This work was facilitated by an international agreement under NSF Grants 9119540 and 9421640. A preliminary version of this work appeared at the 1995 European Workshop on Computational Learning Theory. y Supported in part by NSF Grant 9020079. z Supported in part by NSF Grants 9020079 and 9301339.
rusins@cc.lu.lv
E m Kinbery Computer and Information Sciences Department University of Delaware Newark, DE 19716, USA
kinber@
Carl H. Smithz Department of Computer Science University of Maryland College Park, MD 20912 USA